High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser
Structure learning ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 .
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F }
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure)
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map
Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map ◮ We consider distributions that have a perfect map
Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3
Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures
Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 1 X 4 X 3 CPDAG
Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 X 3 CPDAG DAG 1 DAG 2 DAG 3
Possible applications of DAGs/CPDAGs ◮ Efficient estimation/computation using factorization: p � f ( x 1 , . . . , x p ) = f ( x j | pa ( x j , G )) j =1 ◮ Probabilistic reasoning in expert systems ◮ Causal inference ◮ ...
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables }
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables }
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 1 X 4 X 3 CPDAG
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 1 X 4 X 1 X 4 X 3 X 3 CPDAG Marrying
CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 CPDAG Marrying CIG
Summary of problem definition ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0
Three main approaches for structure learning ◮ Constraint-based: ◮ Conditional independencies in the data impose constraints on the CPDAG ◮ Example: PC-algorithm (Spirtes et al. ’93) ◮ Score-based: ◮ A score function is optimized over the space of DAGs/CPDAGs ◮ Example: greedy equivalence search (GES) (Chickering ’02) ◮ Hybrid: ◮ A score function is optimized over a restricted space of DAGs/CPDAGs, where the restricted space is determined using conditional independence constraints ◮ Examples: Max-Min Hill Climbing (MMHC) (Tsmardinos et al ’06), Restricted GES (RGES: GES restricted to estimated CIG)
Recommend
More recommend