What does strong causal influence mean? Joint work with David - PowerPoint PPT Presentation

What does ’strong causal influence’ mean? Joint work with David Balduzzi, Moritz Grosse-Wentrup and Bernhard Sch¨ olkopf Dominik Janzing Max Planck Institute for Intelligent Systems T¨ ubingen, Germany 1

¡ ¡ Quantifying strength of an arrow: Given: • causally sufficient set of variables X 1 , . . . , X n • causal DAG G • all causal conditionals P ( x j | pa j ) even for values pa j with probability zero (more than just knowing P ( X 1 , . . . , X n ) X 1 X 2 X 4 X 3 quantify the strength of X i → X j

¡ ¡ Motivation: X X Y Y Z Z W W Maybe, the true causal DAG is always complete if we also account for weak interactions. Which ones are so weak that we can neglect them?

¡ ¡ Strength of a set of arrows Idea: • strength of an arrow measures its relevance for understanding the behavior of the system under inverventions • strength of a set of arrows measures their relevance for understanding the behavior of the system under interventions • if each arrow in S is irrelevant then S could still be relevant

¡ ¡ Note: X X Y Y Z Z W W this picture is misleading because for a set S of arrows • each element may have negligible strength • but jointly they are not negligible our causal strength will not be subadditive over the edges!

¡ ¡ Information theoretic approach advantages of information theory • variables may have different domains • quantities are invariant under rescaling • related to thermodynamics • better for non-statistical generalizations don’t consider approaches that involve expectations, variances, etc. (ANOVA, ACE. . . )

¡ ¡ Some related work • Avin, Sphitser, Pearl: Identifiability of path-specific effects, 2005. • Pearl: direct and indirect effects, 2001. • Robins, Greenland: Identifiability and exchangeability of direct and indirect effects, 1992. • Holland: Causal inference, path analysis, and recursive structural equation models, 1988. do not achieve our goal because: • measure impact of switching X from x to x 0 for one particular pair ( x, x 0 ) on Y when other paths are blocked • we want an overall score of the strength of X → Y without referring to particular pairs

¡ ¡ Axiomatic approach: Let S be a set of arrows. • Let C S denote its strength. • Postulate desired properties of C S .

¡ ¡ Postulate 0 Causal Markov condition: if C S = 0 then P is also Markov w.r.t. G S (after removing all arrows in S ) X DAG G S X DAG G S Z Z Y Y

¡ ¡ Postulate 1 Mutual information: Y X for this simple DAG we postulate C X → Y = I ( X ; Y ) (all the dependences are due to the influence of X on Y , hence the strength of dependences can be a measure of the strength of the influence)

¡ ¡ Alternative option: Y X C X → Y := capacity of the information channel P ( Y | do ( X )) = P ( Y | X ) defined by maximizing I ( X ; Y ) over all possible input distributions Q ( X ) • requires knowing P ( Y | x ) also for x -values that never/seldom occur • quantifies the potential influence rather than the actual one • nevertheless an interesting option

¡ ¡ Potential strength vs actual strength Assume a medical study shows that • changing cholesterol within the range of values occurring in humans has no impact on life expectancy • increasing it by 10 times compared to the highest observed value had a strong impact Which statement would you prefer: • “cholesterol has a strong impact on life expectancy” • “cholesterol would have a strong impact on life expectancy if it was much higher than it is”

¡ ¡ Postulate 2 Locality: ξ X → Y is determined by P ( Y | PA Y ) and P ( PA Y ) X X Y Y Z Z Z is irrelevant in both cases

¡ ¡ Postulate 3: C X → Y ≥ I ( X ; Y | PA X Y ) Quantitative causal Markov cond: PA Y X (parents of Y without X) Idea: removing X → Y would imply I ( X ; Y | PA X Y ) = 0 X X Y Y No other arrow can generate non-zero dependence I ( X ; Y | PA X Y )

¡ ¡ Postulate 4: Heredity: If T ⊃ S then C T = 0 ⇒ C S = 0 (subsets of irrelevant sets of arrows are irrevalent)

¡ ¡ Apart from the postulates. . . Consider a simple communication scenario for which we might agree on how C should read...

¡ ¡ Toy model with partial copy operations: • Each variable X j consists of k j bits • some of the bits are set uniformly at random • the remaining ones are copied from parents i.e. structural equation model X j = f j ( PA j , U j ) where • every X j and U j is a vector of bits • every f j is a restriction map

¡ ¡ Example with X → Y 1. X sets all its bits randomly 1 0 1 1 0 X Y 2. Y copies some of them 1 0 1 1 0 1 1 0 X Y 1 0 1 1 0 1 1 0 X Y 3. Y sets the remaining ones randomly 1 0 1 1 0 1 1 0 1 X Y

¡ ¡ Do we agree that. . . . . . C X → Y should be the number of bits that Y takes from X ? (for the simple DAG X → Y this number equals I ( X ; Y ) )

¡ ¡ Why I ( X ; Y ) is an inappropriate measure for general DAGs X Z X Z a) Y Y b) doesn’t account for the fact that part of the dependences are due to a) the confounder Z b) the indirect influence via Z

¡ ¡ First guess: I ( X ; Y | Z ) X Z X Z a) Y Y b) • qualitatively, it behaves correctly: screens off the path involving Z • quantitatively wrong because. . .

¡ ¡ Fails even for a simple copy scenario Z Z Z Z 4) 3) 1) 2) 1 1 1 1 1 1 X X Y X X Y Y Y 1 1 1 1 1 1 1 1 • I ( X ; Y | Z ) = 0 because X and Y are constants when conditioned on Z • we would like to have C X → Y = 1

¡ ¡ Why I ( X ; Y | Z ) is inappropriate Z X a) Y b) weakening Z → Y converts a) into b), where C X → Y = I ( X ; Y )

¡ ¡ Idea: Measure strength of X on Y by the impact of interventions on X (while adjusting other variables) • formalized by Ay & Polani (2006) in terms of Pearl’s do-calculus • defined family of information theoretic quantities called “Information Flow”

¡ ¡ does not solve our problem • Ay and Polani’s Information Flow measures an interesting quantity (something related to causality) • we don’t consider it a good measure for the strength of an arrow • arguments follow

¡ ¡ First attempt: Z X a) Y The strength of X → Y is the mutual information between I ( X ; Y ) in a scenario where • X is subjected to a randomized intervention

¡ ¡ Fails because... Z • X, Y, Z binary • P ( Z ) uniform X Y • Y = X ⊕ Z X and Y are independent both with respect to the • observed distribution • distribution obtained by randomizing X

¡ ¡ Second attempt: Z X a) Y The strength of X → Y is the conditional mutual information I ( X ; Y | Z ) in a scenario where • X is subjected to a randomized intervention Question: X is randomized according to which distribution?

¡ ¡ Second attempt, Version I Z X a) Y The strength of X → Y is the conditional mutual information between I ( X ; Y | Z ) in a scenario where • X is subjected to a randomized intervention • X distributed according to P ( X | Z )

¡ ¡ Fails because. . . Z = X Y If X is a copy of Z , • given Z , X is a constant • I ( X ; Y | Z ) = 0 also for the post-interventional distribution

¡ ¡ Second attempt, Version II Z X a) Y The strength of X → Y is the conditional mutual information between I ( X ; Y | Z ) in a scenario where • X is subjected to a randomized intervention • X distributed according to P ( X )

¡ ¡ Violates Postulate 3: Z X a) Y there is a contrived example where strength of X → Y would be smaller than I ( X ; Y | Z )

¡ ¡ Violates Postulate 3: Z random bit X Y k bits k bits • copied from X for Z = 1 • randomized for Z = 1 • set to 1 for Z = 0 • set to zero for Z = 0 I ( X ; Y | Z ) = k/ 2 because k bits are copied in half of the cases for X and Z independent, copying occurs only in 1 / 4 of the cases

¡ ¡ Hence. . . • defining strength of an arrow by intervention on nodes seems difficult • we now define the strength by intervention on edges

¡ ¡ Our approach: measure impact of ‘deleting arrows’ To define the strength of S , cut every edge in S and feed the open end with an independent copy Z Z P(Z) S X X Y Y P(X) defines new distribution P S ( x, y, z ) := P ( x, z ) P x 0 ,z 0 P ( y | x 0 , z 0 ) P ( x 0 ) P ( z 0 ) C S := D ( P k P S )

¡ ¡ Idea of ‘edge deletion’ Z P(Z) X Y P(X) • edges are electrical wires • attacker cuts some wires • feeds the open ends with random input • distribution of input chosen like observed marginal distribution • only distribution that is locally accessible

¡ ¡ why product distribution? Z Z X X P(X,Z) P(X)P(Z) Y Y our edge deletion ‘source exclusion’ by Ay & Krakauer (2006) • not accessible to local attacker • Postulate 4 fails

¡ ¡ Applying our measure to our toy model Z Z Z 1 0 1 0 S S X X S X 0 0 Y Y Y D ( P k P S ) = number of corrupted bits (in agreement with what we expect)

¡ ¡ Quantifying the impact of a vaccine Age vaccinated or not infected or not P S corresponds to an experiment where • vaccine is randomly redistributed regardless of Age (keeping the fraction of treated subjects) • the random variable vaccinated is reinterpreted as ‘intention to get vaccinated’

What does strong causal influence mean? Joint work with David - PowerPoint PPT Presentation

What does strong causal influence mean? Joint work with David Balduzzi, Moritz Grosse-Wentrup and Bernhard Sch olkopf Dominik Janzing Max Planck Institute for Intelligent Systems T ubingen, Germany 1 Quantifying strength of

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

WUF WUFAR 101 AR 101 Federal Funding Conference March 2020 What does WU What does WUFAR mean?

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

What does MFA mean? Jeffrey Goldberg jeff@1Password.com What does MFA mean? It

What does mean? What does Baptism mean? 1) Baptism is a symbol pointing to the truth of

If market is efficient, does this mean expert advice is worthless? Does this mean there is no room

Granger Causality and Dynamic Structural Systems Halbert White and Xun Lu Department of

AT ATI TEAS READING REVIEW PART 3 MAKING INFERENCES AND DRAWING CONCLUSIONS Understanding

The Geologic Columns Conundrum Beth Haven Creation Conference May 13, 2017 Limits of

https://tinyurl.com/lakemhsa LAKE COUNTY PUBLIC HEARING FOR MHSA THREE-YEAR PROGRAM &

Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony

Why causality? To paraphrase a old joke, there are two types of statisticians: those who do

Signal Processing for Medical Applications Frequency Domain Analyses Muthuraman Muthuraman

New perspectives for air transport performance Dr Andrew Cook Principal Research Fellow

What does strong causal influence mean? Joint work with David - PowerPoint PPT Presentation

What does strong causal influence mean? Joint work with David Balduzzi, Moritz Grosse-Wentrup and Bernhard Sch olkopf Dominik Janzing Max Planck Institute for Intelligent Systems T ubingen, Germany 1 Quantifying strength of

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

WUF WUFAR 101 AR 101 Federal Funding Conference March 2020 What does WU What does WUFAR mean?

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

What does MFA mean? Jeffrey Goldberg jeff@1Password.com What does MFA mean? It

What does mean? What does Baptism mean? 1) Baptism is a symbol pointing to the truth of

If market is efficient, does this mean expert advice is worthless? Does this mean there is no room

Granger Causality and Dynamic Structural Systems Halbert White and Xun Lu Department of

AT ATI TEAS READING REVIEW PART 3 MAKING INFERENCES AND DRAWING CONCLUSIONS Understanding

The Geologic Columns Conundrum Beth Haven Creation Conference May 13, 2017 Limits of

https://tinyurl.com/lakemhsa LAKE COUNTY PUBLIC HEARING FOR MHSA THREE-YEAR PROGRAM &amp;

Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony

Why causality? To paraphrase a old joke, there are two types of statisticians: those who do

Signal Processing for Medical Applications Frequency Domain Analyses Muthuraman Muthuraman

New perspectives for air transport performance Dr Andrew Cook Principal Research Fellow

https://tinyurl.com/lakemhsa LAKE COUNTY PUBLIC HEARING FOR MHSA THREE-YEAR PROGRAM &