non parametric causal models
play

Non-parametric causal models Robin J. Evans Thomas S. Richardson - PowerPoint PPT Presentation

Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44 Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with


  1. Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44

  2. Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with latents 2 / 44

  3. Outline for Part One Intervention distributions The general identification problem Tian’s ID Algorithm Fixing: generalizing marginalizing and conditioning Non-parametric constraints aka Verma constraints 3 / 44

  4. Intervention distributions (I) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X 4 / 44

  5. Example L X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 5 / 44

  6. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) × p ( M | ˜ x ) p ( Y | L , M ) 5 / 44

  7. Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). 6 / 44

  8. Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). Note: p ( Y | do( X = x )) is a sum over a product of terms p ( v | pa( v )). 6 / 44

  9. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L , M ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m 7 / 44

  10. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L , M ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m Note that p ( Y | do( X = ˜ x )) � = p ( Y | X = ˜ x ). 7 / 44

  11. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 8 / 44

  12. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) 8 / 44

  13. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) 8 / 44

  14. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m 8 / 44

  15. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l 8 / 44

  16. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l = p ( Y ) � = P ( Y | ˜ x ) since X �⊥ ⊥ Y . ‘Correlation is not Causation’. 8 / 44

  17. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m 9 / 44

  18. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m Here we have used that M ⊥ ⊥ L | X and Y ⊥ ⊥ X | L , M . 9 / 44

  19. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m � = p ( L = l ) p ( Y , M = m | L = l , X = ˜ x ) l , m 9 / 44

  20. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m � = p ( L = l ) p ( Y , M = m | L = l , X = ˜ x ) l , m � = p ( L = l ) p ( Y | L = l , X = ˜ x ) . l ⇒ can find p ( Y | do( X = ˜ x )) even if M not observed. This is an example of the ‘back door formula’. 9 / 44

  21. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) 10 / 44

  22. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m 10 / 44

  23. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m 10 / 44

  24. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m �� � � p ( X = x ∗ ) p ( Y | M = m , X = x ∗ ) = p ( M = m | X = ˜ x ) m x ∗ 10 / 44

  25. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m �� � � p ( X = x ∗ ) p ( Y | M = m , X = x ∗ ) = p ( M = m | X = ˜ x ) m x ∗ ⇒ can find p ( Y | do( X = ˜ x )) even if L not observed. This is an example of the ‘front door formula’. 10 / 44

  26. But with both L and M unobserved.... L X M Y ...we are out of luck! 11 / 44

  27. But with both L and M unobserved.... L X M Y ...we are out of luck! Given P ( X , Y ), absent further assumptions we cannot distinguish: L X Y X M Y 11 / 44

  28. General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? 12 / 44

  29. General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? A: Provide either an identifying formula that is a function of p ( O ) or report that p ( Y | do( X )) is not identified. 12 / 44

  30. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) 13 / 44

  31. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

  32. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

  33. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x Then remove all latent variables H from the graph. 13 / 44

  34. ADMGs x z x z − → u project w t t y y 14 / 44

  35. ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) 14 / 44

  36. ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) Can read off independences with d/m-separation. The projection preserves the causal structure; Verma and Pearl (1992). 14 / 44

Recommend


More recommend