Type-II errors of independence tests can lead to arbitrarily large errors in estimated causal effects: an illustrative example Workshop UAI 2014 Nicholas Cornia & Joris M. Mooij University of Amsterdam 27/07/2014
Problem Setting 1 Estimation of the causal effect error form the observed 2 covariance matrix 3 Discussion Conclusions and future work 4
Problem Setting 1 Estimation of the causal effect error form the observed 2 covariance matrix 3 Discussion Conclusions and future work 4
Introduction Task: Inferring causation from observational data Challenge: Presence of hidden confounders. Approach: Causal discovery algorithms based on conditional independence (CIs) tests . Simplest case: Three random variables, a single CI test (LCD-Trigger setting). Contribution: Causal predictions are extremely unstable when type II errors arise.
LCD-Trigger Algorithm Cooper (1997) and Chen et al. (2007). The following causal model X 1 X 2 X 3 is implied by Prior assumptions Statistical tests No Selection Bias X 1 �⊥ ⊥ X 2 X 2 �⊥ ⊥ X 3 Acyclicity Faithfulness X 1 ⊥ ⊥ X 3 | X 2 X 2 , X 3 do not cause X 1
Application of the LCD in biology Example Gene expression SNP G P � �� � ���� ���� Single Nucleotide Polymorphism Gene expression level Phenotype Example Disease Treatment X Y Z ���� ���� ���� Gender Disease 1 Disease 2
Linear Gaussian model For simplicity: linear-Gaussian case. Structural equations: � X i = α ij X j + E i X = AX + E i � = j where � � � � δ 2 Example E ∼ N 0 , ∆ ∆ = diag i and A = { α ij } is the weighted α 12 α 23 adjacency matrix of the causal X 1 X 2 X 3 graph ( α ij � = 0 ⇐ ⇒ X i → X j ). X 1 = E 1 X 2 = α 12 X 1 + E 2 X 3 = α 23 X 2 + E 3 Then: � X ∼ N 0 , Σ) Σ = Σ( A , ∆)
Causal effect estimator Causal effect of X 2 on X 3 : Under the LCD assumptions ∂ = Σ 32 � � � � A ∋ α 23 = X 3 | do ( X 2 = x 2 ) X 3 | X 2 E E ∂ x 2 Σ 22 is a valid estimator for the causal effect of X 2 on X 3 . Example Structural equations Structural equations after (observed) an intervention X 1 = E 1 X 1 = E 1 X 2 = x 2 X 2 = α 12 X 1 + E 2 X 3 = α 23 x 2 + E 3 X 3 = α 23 X 2 + E 3
Fundamental question What happens to the error in the causal effect estimator if in reality there is a weak dependence X 1 �⊥ ⊥ X 3 | X 2 , but we do not have enough data to detect it? Type II error: Erroneously accepting the null hypotesis of independence in the statistical test X 1 ⊥ ⊥ X 3 | X 2 . Can we still guarantee some kind of bound for the distance � � � � | E X 3 | X 2 − E X 3 | do ( X 2 ) |
From LCD to our model Starting from the chain X 1 X 2 X 3 X 1 ⊥ ⊥ X 3 | X 2 If we consider a possible weak dependence not detected by our test suddenly the causal graph gains complexity X 4 X 1 X 2 X 3 X 1 �⊥ ⊥ X 3 | X 2 where X 4 is a confounding variable between X 2 and X 3 .
True model X 4 X 1 X 2 X 3 Prior assumptions Statistical tests No Selection Bias X 1 �⊥ ⊥ X 2 Acyclicity X 2 �⊥ ⊥ X 3 Faithfulness A weak conditional dependence X 1 �⊥ ⊥ X 3 | X 2 X 2 , X 3 do not cause X 1 No confounders between X 1 and X 2 , or X 3 , or both (for simplicity)
Causal effect estimation error function Belief True model X 4 α 23 α 23 X 1 X 2 X 3 X 1 X 2 X 3 α 23 = Σ 32 α 23 � = Σ 32 Σ 22 Σ 22 Error in the causal effect estimation function = Σ 32 � � − α 23 g A , Σ Σ 22
Problem Setting 1 Estimation of the causal effect error form the observed 2 covariance matrix 3 Discussion Conclusions and future work 4
Constraint equations Proposition There exists a map Φ : ( A , ∆) → Σ from the model parameters to the observed covariance matrix that defines a set of polynomial equations. From a geometrical point of view, given Σ ( A , ∆) ∈ M ⊂ R 9 A Σ Φ M . ∆
Non-identification of the model parameters In our model the map Φ is not injective. Thus, the manifold M does not reduce to a single point. A Σ Φ M . Φ − 1 =? ∆ Nevertheless it is still an interesting question whether the function g is a bounded function on M or not.
Main result Theorem There exists a map Ψ(Σ , δ 2 2 , δ 2 3 , s 1 , s 2 ) = A where s 1 , s 2 are two signs and the δ 2 2 , δ 2 3 are the variance of the noise sources of X 2 and X 3 respectively. Corollary It is possible to express the error in the causal effect estimation function g as � � det Σ − m δ 2 m − Σ 11 δ 2 ϑ Σ 12 � � 3 2 Σ , Ψ(Σ , δ 2 2 , δ 2 g 3 , s 1 , s 2 ) = + s 1 s 2 � m Σ 22 δ 2 m � �� � 2 small for weak dep. � �� � arbitrarily large where ϑ = Σ 13 Σ 22 − Σ 12 Σ 23 and m = Σ 11 Σ 22 − Σ 2 12 .
Approaching the singularity Proposition lim | g | = + ∞ δ 2 2 → 0 ∀ δ 2 ( s 1 , s 2 ) ∈ {− 1 , 1 } 2 3 ∈ [ 0 , det Σ / m ]
Problem Setting 1 Estimation of the causal effect error form the observed 2 covariance matrix 3 Discussion Conclusions and future work 4
Probabilistic estimation of the error ( δ 2 2 , δ 2 3 ) ∈ D (Σ) ⊂ R 2 M M = { ( δ 2 2 , δ 2 3 ) : | g | ≤ M } If we put a uniform prior on the noise variances Pr ( | g | ≤ M ) = ||M M || || D (Σ) || What would be a reasonable prior distribution for δ 2 2 , δ 2 3 ?
Looking for an approximate bound The causal effect error function g can be optimized over the δ 2 3 parameters, giving a confidence interval for the causal weight α 23 α 23 ∈ [ b − , b + ] ⊂ R where √ � m − Σ 11 δ 2 det Σ 2 ) = γ 2 b ± ( δ 2 m ± � δ 2 m 2
Looking for an approximate bound Suppose we would have a lower bound δ 2 2 ≥ ˆ δ 2 2 then this implies an upper bound on | g | . What would be a practical example where we can assume such a lower bound for the variance δ 2 2 ?
Problem Setting 1 Estimation of the causal effect error form the observed 2 covariance matrix 3 Discussion Conclusions and future work 4
Conclusions The causal effect estimation error is sensible to erroneous conclusions in conditional independence tests. The result is in accord with Robins et al. (2003), on the lack of uniform consistency of causal discovery algorithms, but through this paper we wish to emphasize this issue on the more practical matter of type II errors. In our case it was not possible to identify the model parameters explicitly.
Proposal for future work Bayesian model selection : What would be a reasonable prior distribution for the model parameters? Bayesian Information Criterion : Will the BIC still give reasonable results even though the model parameters are not identifiable? Could it deal with irregular or even singular models?
Proposal for future work Adding an “environment” variable : Might it be reasonable to assume that a part, or most, of the external variability is carried by the covariance between the environment variable W and the other measured ones, including possible confounders? W X 4 X 1 X 2 X 3
Thanks for your attention!
Recommend
More recommend