causal inference theory and applications
play

Causal Inference Theory and Applications Dr. Matthias Uflacker, - PowerPoint PPT Presentation

Causal Inference Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher Schmidt April 24, 2018 Agenda April 24, 2018 Jupyter Notebook Causal Inference in Application Recap Causal Inference in a Nutshell


  1. Causal Inference – Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher Schmidt April 24, 2018

  2. Agenda April 24, 2018 Jupyter Notebook „Causal Inference in Application“ ■ Recap Causal Inference in a Nutshell ■ Introduction to Structural Causal Models ■ Preliminaries 1. Structural Causal Models 2. (Local) Markov Condition 3. Factorization 4. Global Markov Condition 5. Functional Model and Markov conditions 6. Causal Inference Faithfulness 7. - Theory and Applications Constraint-based Causal Inference 8. Uflacker, Huegle, Markov Equivalence Class 9. Schmidt Summary 10. Slide 2 Excursion: Maximal Ancestral Graphs 11.

  3. Jupyter Notebook “Causal Inference in Application”

  4. Jupyter Notebook Causal Inference in Application Causal Inference - Theory and Applications Uflacker, Huegle, Schmidt Slide 4

  5. Jupyter Notebook Access Information System Link will be provided via email once we have the list of participants! Procedure 1. Login via LDAP (standard HPI credentials) 2. Use folder Causal Inference – Theory and Applications 3. We provide a Master Notebook Please use as a read only resource Copy relevant information into your Causal Inference local workspace - Theory and 4. Your local workspace either in your Applications home directory or as a separate Uflacker, Huegle, Schmidt folder in our courses’ folder 5. Let us know if you require new packages Slide 5

  6. Causal Inference in a Nutshell

  7. Causal Inference in a Nutshell Recap: The Concept Traditional Statistical Paradigm of Structural Inference Paradigm Causal Models Data Generating Aspects of 𝑯 Model Aspects of 𝑸 Joint Distribution Causal Inference Inference Inference - Theory and Data Applications Uflacker, Huegle, E.g., what is the sailors’ probability of E.g., what is the sailors’ probability of Schmidt recovery when we see a treatment with recovery if we do treat them with lemons? lemons? Slide 7 𝑹 𝑸 = 𝑸 𝒔𝒇𝒅𝒑𝒘𝒇𝒔𝒛 𝒎𝒇𝒏𝒑𝒐𝒕 𝑹 𝑯 = 𝑸 𝒔𝒇𝒅𝒑𝒘𝒇𝒔𝒛 𝒆𝒑(𝒎𝒇𝒏𝒑𝒐𝒕)

  8. Introduction to Structural Causal Models

  9. Introduction to Causal Graphical Models Content Preliminaries 1. Structural Causal Models 2. (Local) Markov Condition 3. Factorization 4. Global Markov Condition 5. Functional Model and Markov conditions 6. Faithfulness 7. Constraint-based Causal Inference 8. Causal Inference Markov Equivalence Class 9. - Theory and 10. Summary Applications Uflacker, Huegle, 11. Excursion: Maximal Ancestral Graphs Schmidt Slide 9

  10. 1. Preliminaries Notation 𝐵, 𝐶 events ■ 𝑌, 𝑍, 𝑎 random variables ■ 𝑦 value of random variable ■ 𝑄𝑠 probability measure ■ 𝑌 probability distribution of 𝑌 ■ 𝑄 𝑞 density ■ 𝑞 𝑦 or 𝑞 𝑌 density of 𝑄 ■ 𝑌 Causal Inference 𝑞 𝑦 density of 𝑄 𝑌 evaluated at the point 𝑦 ■ - Theory and Applications Uflacker, Huegle, 𝑌 ⊥ 𝑍 independence of 𝑌 and 𝑍 ■ Schmidt 𝑌 ⊥ 𝑍 | 𝑎 conditional independence of 𝑌 and 𝑍 given 𝑎 ■ Slide 10

  11. 1. Preliminaries Independence of Events Two events 𝐵 and 𝐶 are called independent if ■ Pr 𝐵 ∩ 𝐶 = Pr 𝐵 ⋅ Pr 𝐶 , or - rewritten in conditional probabilities - if Pr A = 𝐵 ∩ 𝐶 = Pr 𝐵 𝐶 , 𝐶 Pr B = 𝐵 ∩ 𝐶 = Pr 𝐶 𝐵 . 𝐵 𝐵 1 , … , 𝐵 𝑜 are called (mutually) independent if for every subset 𝑇 ⊂ {1, … , 𝑜} ■ we have Causal Inference Pr ሩ 𝐵 𝑗 = ෑ Pr 𝐵 𝑗 . - Theory and Applications 𝑗∈𝑇 𝑗∈𝑇 Note: Uflacker, Huegle, ■ Schmidt for 𝑜 ≥ 3 , pairwise independence Pr 𝐵 𝑗 ∩ 𝐵 𝑘 = Pr 𝐵 𝑗 ⋅ Pr 𝐵 𝑘 for all 𝑗, 𝑘 does not imply (mutual) independence. Slide 11

  12. 1. Preliminaries Independence of Random Variables Two real-valued random variables X and 𝑍 are called independent, ■ 𝑌 ⊥ 𝑍, if for every x, 𝑧 ∈ ℝ , the events 𝑌 ≤ 𝑦 and {𝑍 ≤ 𝑧} are independent, Or, in terms of densities: for all 𝑦, 𝑧 , 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑞 𝑧 . Note: ■ If 𝑌 ⊥ 𝑍 , then E XY = E X E[Y] , and 𝑑𝑝𝑤 𝑌, 𝑍 = 𝐹 𝑌𝑍 − 𝐹 𝑌 𝐹 𝑍 = 0. The converse is not true: If, 𝑑𝑝𝑤 𝑌, 𝑍 = 0 , then 𝑌 ⊥ 𝑍 . Causal Inference - Theory and No correlation does not imply independence Applications Uflacker, Huegle, However, we have, for large ℱ : ∀𝑔, 𝑕 ∈ ℱ: 𝑑𝑝𝑤 𝑔 𝑌 , 𝑕 𝑍 = 0 , then 𝑌 ⊥ 𝑍. Schmidt Slide 12

  13. 1. Preliminaries Conditional Independence of Random Variables Two real-valued random variables X and 𝑍 are called conditionally ■ independent given Z, 𝑌 ⊥ 𝑍 | 𝑎 or (𝑌 ⊥ 𝑍 | 𝑎) 𝑄 if 𝑞 𝑦, 𝑧 𝑨 = 𝑞 𝑦 𝑨 𝑞(𝑧|𝑨) For all 𝑦, 𝑧 and for all 𝑨 s.t. 𝑞 𝑨 > 0. Note: ■ It is possible to find 𝑌, 𝑍 which are conditionally independent given a Causal Inference variable 𝑎 but unconditionally dependent, and vice versa. - Theory and Applications Uflacker, Huegle, Schmidt Slide 13

  14. 2. Structural Causal Models Definition (Pearl) ■ Directed Acyclic Graph (DAG) 𝐻 = (𝑊, 𝐹) Cooling House Example: □ Vertices 𝑊 1 , … , 𝑊 𝑜 𝑊 𝑊 1 2 □ Directed edges 𝐹 = (𝑊 𝑘 ) , i.e., 𝑊 𝑘 , 𝑗 , 𝑊 𝑗 → 𝑊 □ No cycles 𝑊 𝑊 4 3 ■ Use kinship terminology, e.g., for path 𝑊 𝑗 → 𝑊 𝑘 → 𝑊 𝑙 𝑊 𝑘 ) parent of 𝑊 𝑊 □ 𝑊 𝑗 = 𝑄𝑏(𝑊 5 6 𝑘 ancestors of 𝑊 𝑊 𝑗 , 𝑊 𝑘 = 𝐵𝑜𝑕 𝑊 □ ▪ 𝑊 1 = 𝑂 0,1 𝑙 𝑙 ▪ 𝑊 2 = 𝑂 0,1 𝑗 ) descendants of 𝑊 □ 𝑊 𝑘 , 𝑊 𝑙 = 𝐸𝑓𝑡(𝑊 𝑗 Causal Inference ▪ 𝑊 3 = 3 𝑊 2 + 𝑂(0,1) ■ Directed Edges encode direct causes via - Theory and ▪ 𝑊 4 = 4 𝑊 1 + 5 𝑊 2 + 0.7 𝑊 3 + 𝑂(0,1) Applications with independent noise 𝑂 1 , … , 𝑂 𝑜 ▪ 𝑊 5 = 𝑊 4 + 𝑂(0,1) □ 𝑊 𝑘 = 𝑔 𝑘 Pa V j , N j Uflacker, Huegle, ▪ 𝑊 6 + 1.2 𝑊 4 + 𝑂(0,1) Schmidt This forms the Causal Graphical Model Slide 14

  15. 2. Structural Causal Models Connecting 𝐻 and 𝑄 Basic Assumption: Causal Sufficiency ■ All relevant variables are included in the DAG 𝐻 □ Data Generating Model Joint Distribution Causal Inference - Theory and 𝒀 ⊥ 𝒁 𝒂 𝑯 ⇒ 𝒀 ⊥ 𝒁 𝒂 𝑸 Applications Uflacker, Huegle, Key Postulate: (Local) Markov Condition ■ Schmidt Essential mathematical concept: d-separation ■ Slide 15 (describes the conditional independences required by a causal DAG)

  16. 3. (Local) Markov Condition Theorem (Local) Markov Condition: 𝑘 statistically independent of nondescendants, given parents 𝑄𝑏(𝑊 𝑘 ) , i.e., 𝑊 𝑾 𝒌 ⊥ 𝑾 𝑾/𝑬𝒇𝒕(𝑾 𝒌 ) |𝑸𝒃 𝑾 𝒌 . I.e., every information exchange with its nondescendants involves its parents ■ Example: ■ 𝑊 𝑊 Causal Inference 1 2 𝑊 6 ⊥ 𝑊 1 , 𝑊 2 , 𝑊 3 |𝑊 ▪ 4 - Theory and Applications 𝑊 5 ⊥ 𝑊 1 , 𝑊 2 , 𝑊 3 |𝑊 ▪ 𝑊 𝑊 4 4 Uflacker, Huegle, 3 Schmidt 𝑊 𝑊 5 Slide 16 6

  17. 3. (Local) Markov Condition Supplement (Lauritzen 1996) Assume 𝑊 𝑜 has no descendants, then 𝑂𝐸 𝑜 = {𝑊 𝑜−1 } . ■ 1 , … , 𝑊 Thus the local Markov condition implies ■ 𝑊 𝑜 ⊥ 𝑊 1 , … , 𝑊 𝑜−1 |𝑄𝑏 𝑊 𝑜 . Hence, the general decomposition ■ 𝑞 𝑤 1 , … , 𝑤 𝑜 = 𝑞 𝑤 𝑜 𝑤 1 , … , 𝑤 𝑜−1 𝑞(𝑤 1 , … , 𝑤 𝑜−1 ) becomes 𝑞 𝑤 1 , … , 𝑤 𝑜 = 𝑞 𝑤 𝑜 𝑄𝑏(𝑤 𝑜 ) 𝑞 𝑤 1 , … , 𝑤 𝑜−1 . Induction over 𝑜 yields to ■ 𝑜 Causal Inference 𝑞 𝑤 1 , … , 𝑤 𝑜 = ෑ 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 . - Theory and Applications 𝑗=1 Uflacker, Huegle, Schmidt I.e., the graph shows us how to factor the joint distribution 𝑄 𝑊 . ■ Slide 17

  18. 4. Factorization Definition Factorization: 𝑜 𝑞 𝑤 1 , … , 𝑤 𝑜 = ෑ 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 . 𝑗=1 I.e., conditionals as causal mechanisms generating statistical dependence ■ Example: ■ 𝑞 𝑊 = 𝑞 𝑤 1 , … , 𝑤 𝑜 𝑊 𝑊 Causal Inference 1 2 = 𝑞 𝑤 1 ⋅ 𝑞 𝑤 2 - Theory and Applications ⋅ 𝑞 𝑤 3 𝑤 2 ⋅ 𝑞 𝑤 4 𝑤 1 , 𝑤 2 , 𝑤 3 𝑊 𝑊 4 Uflacker, Huegle, 3 Schmidt ⋅ 𝑞 𝑤 5 𝑤 4 ⋅ 𝑞 𝑤 6 𝑤 4 𝑊 𝑊 = ς 𝑗=1 𝑜 5 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 Slide 18 6

  19. 5. Global Markov Condition D-Separation (Pearl 1988) Path = sequence of pairwise distinct vertices where consecutive ones are ■ adjacent A path 𝑟 is said to be blocked by a set 𝑇 if ■ 𝑟 contains a chain 𝑊 𝑙 or a fork 𝑊 𝑙 such that the □ 𝑗 → 𝑊 𝑘 → 𝑊 𝑗 ← 𝑊 𝑘 → 𝑊 middle node is in 𝑇 , or 𝑟 contains a collider 𝑊 𝑙 such that the middle node is not in 𝑇 𝑗 → 𝑊 𝑘 ← 𝑊 □ and such that no descendant of 𝑊 𝑘 is in S. Causal Inference - Theory and D-separation: Applications 𝑇 is said to d-separate 𝒀 and 𝒁 in the DAG 𝐻 , i.e., Uflacker, Huegle, 𝑌 ⊥ 𝑍 𝑇 𝐻 , Schmidt if 𝑇 blocks every path from a vertex in 𝑌 to a vertex in 𝑍. Slide 19

Recommend


More recommend