Causality V. Bunkin, L. Steffen (Seminar in Statistics) Causality - - PowerPoint PPT Presentation

causality
SMART_READER_LITE
LIVE PREVIEW

Causality V. Bunkin, L. Steffen (Seminar in Statistics) Causality - - PowerPoint PPT Presentation

Causality V. Bunkin, L. Steffen (Seminar in Statistics) Causality 02.05.2016 1 / 23 "According to studies..." V. Bunkin, L. Steffen (Seminar in Statistics) Causality 02.05.2016 2 / 23 What would be the right question in this


slide-1
SLIDE 1

Causality

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 1 / 23

slide-2
SLIDE 2

"According to studies..."

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 2 / 23

slide-3
SLIDE 3
  • What would be the right question in this setting?
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 3 / 23

slide-4
SLIDE 4
  • What would be the right question in this setting?
  • Numerous studies concern themselves only with correlation
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 3 / 23

slide-5
SLIDE 5
  • What would be the right question in this setting?
  • Numerous studies concern themselves only with correlation
  • This can be very misleading
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 3 / 23

slide-6
SLIDE 6
  • What would be the right question in this setting?
  • Numerous studies concern themselves only with correlation
  • This can be very misleading
  • http://tylervigen.com/spurious-correlations
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 3 / 23

slide-7
SLIDE 7

Simpson’s Paradox

Figure: Success rates of two treatments for kidney stones

  • Treatment B seems to perform better overall (83%)
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 4 / 23

slide-8
SLIDE 8

Simpson’s Paradox

Figure: Success rates of two treatments for kidney stones

  • Treatment B seems to perform better overall (83%)
  • But treatment A performs better in both settings
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 4 / 23

slide-9
SLIDE 9

Posing correct questions

  • Correlation vs. Causality
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-10
SLIDE 10

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-11
SLIDE 11

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • Correlation does not imply causality
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-12
SLIDE 12

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • Correlation does not imply causality
  • Mostly we’re interested if A is having a direct effect on (causing) B
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-13
SLIDE 13

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • Correlation does not imply causality
  • Mostly we’re interested if A is having a direct effect on (causing) B
  • What are possible causal explanations if A is correlated to B?
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-14
SLIDE 14

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • Correlation does not imply causality
  • Mostly we’re interested if A is having a direct effect on (causing) B
  • What are possible causal explanations if A is correlated to B?

– i) A causes B, ii) B causes A or iii) hidden actor Z causes A and B

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-15
SLIDE 15

Posing correct questions

  • Correlation vs. Causality
  • Missing background knowledge can lead to false conclusions
  • Correlation does not imply causality
  • Mostly we’re interested if A is having a direct effect on (causing) B
  • What are possible causal explanations if A is correlated to B?

– i) A causes B, ii) B causes A or iii) hidden actor Z causes A and B – Reichenbachs common cause principle is provable

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 5 / 23

slide-16
SLIDE 16

Hidden Cause/Actor

  • In 1999 research established a significant correlation between the

presence of a nightlight in a child’s bedroom and myopia (shortsightedness).

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 6 / 23

slide-17
SLIDE 17

Hidden Cause/Actor

  • In 1999 research established a significant correlation between the

presence of a nightlight in a child’s bedroom and myopia (shortsightedness).

  • In 2000 follow-up research found out that parents with myopia are

more likely to put a nightlight in their child’s bedroom. Their children also are more inclined to develop myopia for genetical reasons.

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 6 / 23

slide-18
SLIDE 18

Motivation

"Correlation does not imply causation"

⇔ PX = ˜ PX

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 7 / 23

slide-19
SLIDE 19

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-20
SLIDE 20

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-21
SLIDE 21

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • an edge is undirected if (i, j) ∈ E and (j, i) ∈ E, otherwise it is

directed

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-22
SLIDE 22

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • an edge is undirected if (i, j) ∈ E and (j, i) ∈ E, otherwise it is

directed

  • 3 nodes form an immorality (or v-structure) if one is the child of the

two others that themselves are not adjacent

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-23
SLIDE 23

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • an edge is undirected if (i, j) ∈ E and (j, i) ∈ E, otherwise it is

directed

  • 3 nodes form an immorality (or v-structure) if one is the child of the

two others that themselves are not adjacent

  • a (directed) path is a sequence of distinct i1, ... , in ∈ V with a

(directed) edge between ik and ik+1 for all k = 1, ... , n − 1

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-24
SLIDE 24

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • an edge is undirected if (i, j) ∈ E and (j, i) ∈ E, otherwise it is

directed

  • 3 nodes form an immorality (or v-structure) if one is the child of the

two others that themselves are not adjacent

  • a (directed) path is a sequence of distinct i1, ... , in ∈ V with a

(directed) edge between ik and ik+1 for all k = 1, ... , n − 1

  • all j with a directed path from i to j are called descendants of i, the set
  • f all descendants of i is denoted by DEG

i

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-25
SLIDE 25

Graphs

  • A graph G = (V, E) consists of nodes V and edges E ⊆ V 2 with

(v, v) /

∈ E ∀v ∈ V.

  • i is a parent of j if (i, j) ∈ E and (j, i) /

∈ E, i.e. j is a child of i

  • an edge is undirected if (i, j) ∈ E and (j, i) ∈ E, otherwise it is

directed

  • 3 nodes form an immorality (or v-structure) if one is the child of the

two others that themselves are not adjacent

  • a (directed) path is a sequence of distinct i1, ... , in ∈ V with a

(directed) edge between ik and ik+1 for all k = 1, ... , n − 1

  • all j with a directed path from i to j are called descendants of i, the set
  • f all descendants of i is denoted by DEG

i

  • we identify the nodes j ∈ V with the random variables Xj ∈ X
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 8 / 23

slide-26
SLIDE 26

DAGs

  • a directed acyclic graph (DAG) is G in which there exists no (i, j) with

directed paths from i to j and from j to i, and all the edges are directed

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 9 / 23

slide-27
SLIDE 27

DAGs

  • a directed acyclic graph (DAG) is G in which there exists no (i, j) with

directed paths from i to j and from j to i, and all the edges are directed

  • in a DAG, the disjoint A, B ⊂ V are d-separated by a also disjoint

S ⊂ V if every path between nodes in A and B is blocked by S, i.e. for every path i1 to in:

– ik ∈ S and ik−1 → ik → ik+1 or ik−1 ← ik ← ik+1 or ik−1 ← ik → ik+1 – ik−1 → ik ← ik+1 and neither ik nor any of its descendants is in S

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 9 / 23

slide-28
SLIDE 28

Topological Ordering

Proposition: For each DAG exists a topological ordering π ∈ Sp, that is a bijective mapping

π : {1, ... , p} → {1, ... , p}

that satisfies

π(i) < π(j)

if j ∈ DEG

i

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 10 / 23

slide-29
SLIDE 29

Structural Equation Model

Definition: A structural equation model (SEM) is S := (S, PN), where S = (S1, ... , Sp) are equations Sj : Xj = fj(PAj, Nj), j = 1, ... , p

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 11 / 23

slide-30
SLIDE 30

Interventions

Having established the SEM structure, we now can construct new distributions by changing (intervening upon) structural equations.

Definition (Intervention Distribution)

Consider the distribution SEM (S, PN) PX. We now can replace one or multiple equations and obtain a new SEM

  • S. The new distribution PN

˜ S is

called the intervention distribution and the variables whose structural equations have been changed have been intervened on. We introduce the do operator:

PX

  • S =: P

X|do(Xj=˜ f( ˜ PAj, ˜ Nj)) S

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 12 / 23

slide-31
SLIDE 31

Example for an Intervention (Kidney Stones)

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 13 / 23

slide-32
SLIDE 32
  • New and old N’s need to be independent
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 14 / 23

slide-33
SLIDE 33
  • New and old N’s need to be independent
  • Two special cases
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 14 / 23

slide-34
SLIDE 34
  • New and old N’s need to be independent
  • Two special cases

– The new equation can either keep the same parents but change their influence or restructure the noise component (called imperfect)

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 14 / 23

slide-35
SLIDE 35
  • New and old N’s need to be independent
  • Two special cases

– The new equation can either keep the same parents but change their influence or restructure the noise component (called imperfect) – The new equation is of the type do(Xj = a) (called perfect)

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 14 / 23

slide-36
SLIDE 36
  • New and old N’s need to be independent
  • Two special cases

– The new equation can either keep the same parents but change their influence or restructure the noise component (called imperfect) – The new equation is of the type do(Xj = a) (called perfect)

  • Example: Suppose S is

X = NX Y = 4 · X + NY with NX, NY ∼ N (0, 1) Compare the intervention distribution of Y for do(X = 2) and do(X = 3) with PY

S? Now reverse the roles of X and Y.

What happens?

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 14 / 23

slide-37
SLIDE 37

Causal Effect

Definition (total causal effect)

Given a SEM S X has a causal effect on Y ⇔ X

⊥ ⊥ Y in PX|do(X= ˜

NX ) S

TFAE:

  • There is a causal effect X to Y
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 15 / 23

slide-38
SLIDE 38

Causal Effect

Definition (total causal effect)

Given a SEM S X has a causal effect on Y ⇔ X

⊥ ⊥ Y in PX|do(X= ˜

NX ) S

TFAE:

  • There is a causal effect X to Y
  • There are a, b s.t. PY|do(X=a)

S

= PY|do(X=b)

S

  • There is an a s.t. PY|do(X=a)

S

= PY

S

  • X

⊥ ⊥ Y in PX,Y|do(X= ˜

NX ) S

for any ˜ NX whose dist. has full support

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 15 / 23

slide-39
SLIDE 39

Remark:

  • If there is no directed path from X to Y, then there is no causal effect
  • Sometimes there is a directed path, but no causal effect.
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 16 / 23

slide-40
SLIDE 40

Definition (Markov Property & Theorem)

Given a DAG G and a joint distribution PX, this distribution is said to satisfy

  • the global Markov property with respect to G if

A,B d-sep. by C ⇒ A⊥

⊥ B | C ∀ disjoint sets A,B,C

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 17 / 23

slide-41
SLIDE 41

Definition (Markov Property & Theorem)

Given a DAG G and a joint distribution PX, this distribution is said to satisfy

  • the global Markov property with respect to G if

A,B d-sep. by C ⇒ A⊥

⊥ B | C ∀ disjoint sets A,B,C

  • the local Markov property with respect to G if each variable is

independent of its non-descendants given its parents

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 17 / 23

slide-42
SLIDE 42

Definition (Markov Property & Theorem)

Given a DAG G and a joint distribution PX, this distribution is said to satisfy

  • the global Markov property with respect to G if

A,B d-sep. by C ⇒ A⊥

⊥ B | C ∀ disjoint sets A,B,C

  • the local Markov property with respect to G if each variable is

independent of its non-descendants given its parents

  • the Markov factorization property with respect to G if

p(x) = p(x1, ..., xp) =

p

  • j=1

p(xj|xPAG

j )

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 17 / 23

slide-43
SLIDE 43

Definition (Markov Property & Theorem)

Given a DAG G and a joint distribution PX, this distribution is said to satisfy

  • the global Markov property with respect to G if

A,B d-sep. by C ⇒ A⊥

⊥ B | C ∀ disjoint sets A,B,C

  • the local Markov property with respect to G if each variable is

independent of its non-descendants given its parents

  • the Markov factorization property with respect to G if

p(x) = p(x1, ..., xp) =

p

  • j=1

p(xj|xPAG

j )

  • IF PX has a densitiy p (w.r.t. a product measure), then all Markov

properties above are equivalent!

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 17 / 23

slide-44
SLIDE 44

Reichenbach’s common cause principle can be proven using the previous Definitions and Theorem. Proposition:

Assume that any pair of variables X and Y can be embedded into a larger system in the following sense: there exists a correct SEM over the collection X of random variables that contains X and Y with graph G. Then the Reichenbach’s common cause principle follows from the Markov property in the following sense: If X and Y are dependent, then there is

  • either a directed path from X to Y
  • or from Y to X
  • or there’s a node T with a directed path from T to X and from T to Y.
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 18 / 23

slide-45
SLIDE 45

Example:

Let the decision to study in Zurich (Z = 1) be determined only by whether one likes nature (N = 1) and whether one thinks ETH is a solid university (U = 1). How could the SEM look?

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 19 / 23

slide-46
SLIDE 46

Example:

Let the decision to study in Zurich (Z = 1) be determined only by whether one likes nature (N = 1) and whether one thinks ETH is a solid university (U = 1). How could the SEM look?

  • N = NN
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 19 / 23

slide-47
SLIDE 47

Example:

Let the decision to study in Zurich (Z = 1) be determined only by whether one likes nature (N = 1) and whether one thinks ETH is a solid university (U = 1). How could the SEM look?

  • N = NN
  • U = NU
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 19 / 23

slide-48
SLIDE 48

Example:

Let the decision to study in Zurich (Z = 1) be determined only by whether one likes nature (N = 1) and whether one thinks ETH is a solid university (U = 1). How could the SEM look?

  • N = NN
  • U = NU
  • Z = (N ∨ U) ⊕ NZ
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 19 / 23

slide-49
SLIDE 49

Example:

Let the decision to study in Zurich (Z = 1) be determined only by whether one likes nature (N = 1) and whether one thinks ETH is a solid university (U = 1). How could the SEM look?

  • N = NN
  • U = NU
  • Z = (N ∨ U) ⊕ NZ
  • choose NN, NU ∼iid Ber(0.5) and NZ ∼iid Ber(0.1)
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 19 / 23

slide-50
SLIDE 50

From the SEM we can see that N and U are assumed to be independent. If you ask engineering students in Zurich (you condition on Z = 1, the answers to whether they like nature or think that ETH is a good university become anti-correlated: if someone is not a fan of nature, he probably likes ETH and vice versa. (Else he’d probably not have studied at ETH due to Ber(0.1)). So we have N

⊥ ⊥ U|(Z = 1).

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 20 / 23

slide-51
SLIDE 51

Truncated Factorization

Consider SEM S with structural equations Xj = fj(Xpa(j), Nj) and density pS. We have pS(x1, ... , xp) =

p

  • j=1

pS(xj|xpa(j))

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 21 / 23

slide-52
SLIDE 52

Truncated Factorization

Construct ˜

S from S by do(Xk = ˜

Nk) pS,do(Xk= ˜

Nk)(x1, ... , xp) = p

  • j=1

pS,do(Xk= ˜

Nk)(xj|xpa(j)) =

  • j=k

pS(xj|xpa(j))˜ p(xk) Special Case: pS,do(Xk=a)(x1, ... , xp) =   

  • j=k pS(xj|xpa(j))

if xk = a

  • therwise
  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 22 / 23

slide-53
SLIDE 53

References

Jonas Peters (2015). Causality, lecture notes

  • V. Bunkin, L. Steffen (Seminar in Statistics)

Causality 02.05.2016 23 / 23