y x x 6? ? z y 6? ? z x ⊥ ⊥ y } z sufficient to determine the equivalence class, y y x y x y x y x y x x in this case, a unique causal graph z z z z z z y y x y x y x x For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the z z z z true model is complete. y y x y x x (Pearl & Geiger 1988, Meek 1995) z z z y x y x y y x y x x z z z z z y y x y x y x y x y x x z z z z z z
Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness 10
Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions - restrict to non-linear causal relations - restrict to specific discrete parameterizations 10
Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions - restrict to non-linear causal relations - restrict to specific discrete parameterizations • Include more general data collection set-ups (and see how assumptions can be adjusted and what equivalence class results) - experimental evidence - multiple (overlapping) data sets - relational data 10
Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles Zhalama talk - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions Tank talk - restrict to non-linear causal relations - restrict to specific discrete parameterizations • Include more general data collection set-ups (and see how assumptions can be adjusted and what equivalence class results) - experimental evidence - multiple (overlapping) data sets - relational data 10
Limitations For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete. (Pearl & Geiger 1988, Meek 1995) 11
Linear non-Gaussian method (LiNGaM) • Linear causal relations: X � ij x j + ✏ j x i = x j ∈ Pa ( x i ) • Assumptions: - causal Markov - causal sufficiency - acyclicity [Shimizu et al., 2006] 12
Linear non-Gaussian method (LiNGaM) • Linear causal relations: X � ij x j + ✏ j x i = x j ∈ Pa ( x i ) • Assumptions: - causal Markov - causal sufficiency - acyclicity ‣ If non-Gaussian, then the true graph is uniquely identifiable ✏ j ∼ from the joint distribution. [Shimizu et al., 2006] 12
Two variable case True model ✏ y ✏ x y = � x + ✏ y y x 13
Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x 13
Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y x 13
Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y ⊥ ⊥ ˜ ✏ x y x 13
Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y ⊥ ⊥ ˜ ✏ x y x ˜ = ✏ x x − ✓ y = x − ✓ ( � x + ✏ y ) = (1 − ✓� ) x − ✓✏ y 13
Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y ⊥ ⊥ ˜ ✏ x y x ˜ = ✏ x x − ✓ y = x − ✓ ( � x + ✏ y ) = (1 − ✓� ) x − ✓✏ y 13
Why Normals are unusual ✏ x ✏ y y = � x + ✏ y Forwards model ? ˜ ✏ x = (1 − ✓� ) x − ✓✏ y y For backwards model x 14
Why Normals are unusual ✏ x ✏ y y = � x + ✏ y Forwards model ? ˜ ✏ x = (1 − ✓� ) x − ✓✏ y y For backwards model x Theorem 1 (Darmois-Skitovich) Let X 1 , . . . , X n be independent, non-degenerate random variables. If for two linear combinations l 1 a 1 X 1 + . . . + a n X n , a i 6 = 0 = b i 6 = 0 l 2 = b 1 X 1 + . . . + b n X n , are independent, then each X i is normally distributed. 14
algorithm/ assumption Markov faithfulness causal sufficiency acyclicity parametric assumption output 15
algorithm/ PC / GES FCI CCD assumption ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ causal ✓ ✓ ✗ sufficiency acyclicity ✓ ✓ ✗ parametric ✗ ✗ ✗ assumption Markov output PAG PAG equivalence 15
algorithm/ cyclic PC / GES FCI CCD LiNGaM lvLiNGaM assumption LiNGaM ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ ✗ causal ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian Markov set of output PAG PAG unique DAG set of graphs equivalence DAGs 15
Limitations For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete. (Pearl & Geiger 1988, Meek 1995) 16
Limitations For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete. (Pearl & Geiger 1988, Meek 1995) 17
Bivariate Linear Gaussian case True model x = ✏ x ✏ x , ✏ y ∼ indep. Gaussian y = x + ✏ y a b c 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -5 0 5 -3 0 3 x y x Forwards Backwards (true) model model (graphics from Hoyer et al. 2009) 18
Bivariate Linear Gaussian case True model x = ✏ x ✏ x , ✏ y ∼ indep. Gaussian y = x + ✏ y a b c 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -5 0 5 -3 0 3 x y x Forwards Backwards (true) model model (graphics from Hoyer et al. 2009) 18
Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j 19
Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability 19
Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability ➡ What if the errors are Gaussian, but is non-linear? f j ( . ) 19
Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability ➡ What if the errors are Gaussian, but is non-linear? f j ( . ) ➡ More generally, under what circumstances is the causal structure represented by this class of models identifiable? 19
Bivariate non-linear Gaussian additive noise model True model ✏ x , ✏ y ∼ indep. Gaussian x = ✏ x y = x + x 3 + ✏ y d e f 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -3 0 3 -5 0 5 y x x Forwards Backwards (true) model model x = g ( y ) + ˜ ✏ x y \ ⊥ ⊥ ˜ ✏ x (graphics from Hoyer et al. 2009) 20
Bivariate non-linear Gaussian additive noise model True model ✏ x , ✏ y ∼ indep. Gaussian x = ✏ x y = x + x 3 + ✏ y d e f 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -3 0 3 -5 0 5 y x x Forwards Backwards (true) model model x = g ( y ) + ˜ ✏ x y \ ⊥ ⊥ ˜ ✏ x (graphics from Hoyer et al. 2009) 20
Bivariate non-linear Gaussian additive noise model True model ✏ x , ✏ y ∼ indep. Gaussian x = ✏ x y = x + x 3 + ✏ y d e f 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -3 0 3 -5 0 5 y x x Forwards Backwards (true) model model x = g ( y ) + ˜ ✏ x y \ ⊥ ⊥ ˜ ✏ x (graphics from Hoyer et al. 2009) 20
Bivariate non-linear Gaussian additive noise model True model ✏ x , ✏ y ∼ indep. Gaussian x = ✏ x y = x + x 3 + ✏ y d e f 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -3 0 3 -5 0 5 y x x Forwards Backwards (true) model model x = g ( y ) + ˜ ✏ x y \ ⊥ ⊥ ˜ ✏ x (graphics from Hoyer et al. 2009) 20
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. 21
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . 21
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . 21
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! 21
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! - extension to discrete additive noise models 21
General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! - extension to discrete additive noise models • If the function is linear , but the error terms non-Gaussian , then one can’t fit a linear backwards model (Lingam), but there are cases where one can fit a non-linear backwards model 21
algorithm/ cyclic PC / GES FCI CCD LiNGaM lvLiNGaM assumptions LiNGaM ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ ✗ causal ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian Markov unique set of set of output PAG PAG equivalence DAG DAGs graphs 22
algorithm/ cyclic non-linear PC / GES FCI CCD LiNGaM lvLiNGaM assumptions LiNGaM additive noise ✓ ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ minimality ✗ causal ✓ ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- non-linear ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian additive noise Markov unique set of set of output PAG PAG unique DAG equivalence DAG DAGs graphs 22
Experiments, Background Knowledge and all the other Jazz 23
Experiments, Background Knowledge and all the other Jazz • how to integrate data from experiments? 23
Experiments, Background Knowledge and all the other Jazz y x • how to integrate data from experiments? l 2 l 1 z 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x w pathways 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x x z < y w w pathways tier orderings 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions biological settings 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions time biological settings subsampled time series 23
Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions time Tank talk biological settings subsampled time series 23
High-Level 24
High-Level data sample x y z w samples w x y samples 24
High-Level data sample x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24
High-Level assumptions, e.g. data • causal Markov sample • causal faithfulness • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24
background High-Level knowledge, e.g. • pathways assumptions, e.g. • tier ordering data • causal Markov • “priors” sample • causal faithfulness • etc. • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24
background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24
background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (in)dependence Encode these as constraints logical constraints on w x y ? y | C || J x 6? the underlying graph samples structure 24
background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (max) SAT-solver (in)dependence Encode these as constraints logical constraints on w x y ? y | C || J x 6? the underlying graph samples structure 24
SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic 25
SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. 25
SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... 25
SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... ➡ very general setting (allows for cycles and latents) and trivially complete 25
SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... ➡ very general setting (allows for cycles and latents) and trivially complete ➡ BUT : erroneous test results induce conflicting constraints: UNsatisfiable 25
Conflicts and Errors • Statistical independence tests produce errors ➡ Conflict: no graph can produce the set of constraints constraints y x x 6? ? y x ⊥ ⊥ y z x 6? ? z z y 6? ? z y x ⊥ y | z ? y | z x 6? x ⊥ z z 26
Recommend
More recommend