high dimensional consistency in score based and hybrid
play

High-dimensional consistency in score-based and hybrid structure - PowerPoint PPT Presentation

High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser Structure learning We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0


  1. High-dimensional consistency in score-based and hybrid structure learning Marloes Maathuis joint work with Preetam Nandy and Alain Hauser

  2. Structure learning ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

  3. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable

  4. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 .

  5. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F }

  6. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

  7. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure)

  8. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map

  9. Terminology... ◮ We consider directed acyclic graphs (DAGs), where each node represents a random variable ◮ A DAG encodes d-separations (Pearl). Example: X 1 → X 2 → X 3 encodes that X 1 and X 3 are d-separated by X 2 . ◮ A DAG G is a perfect map of a distribution F if { d-separations in G} = { conditional independencies in F } ◮ Examples: ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 | X 2 : 3 perfect maps: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ ( X 1 , X 2 , X 3 ) with X 1 ⊥ ⊥ X 3 : 1 perfect map: X 1 → X 2 ← X 3 (v-structure) ◮ ( X 1 , X 2 , X 3 , X 4 ) with X 1 ⊥ ⊥ X 3 , X 2 ⊥ ⊥ X 4 : no perfect map ◮ We consider distributions that have a perfect map

  10. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3

  11. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures

  12. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 1 X 4 X 3 CPDAG

  13. Markov equivalence classes and CPDAGs ◮ DAGs that encode the same set of d-separations form a Markov equivalence class. Example: X 1 → X 2 → X 3 , X 1 ← X 2 ← X 3 , X 1 ← X 2 → X 3 ◮ All DAGs in a Markov equivalence class share the same skeleton and the same v-structures ◮ A Markov equivalence class can be described uniquely by a CPDAG. We want to learn the CPDAG. Example: X 2 X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 X 3 CPDAG DAG 1 DAG 2 DAG 3

  14. Possible applications of DAGs/CPDAGs ◮ Efficient estimation/computation using factorization: p � f ( x 1 , . . . , x p ) = f ( x j | pa ( x j , G )) j =1 ◮ Probabilistic reasoning in expert systems ◮ Causal inference ◮ ...

  15. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables }

  16. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables }

  17. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG

  18. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected

  19. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 1 X 4 X 3 CPDAG

  20. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 1 X 4 X 1 X 4 X 3 X 3 CPDAG Marrying

  21. CPDAG versus conditional independence graph ◮ A conditional independence graph (CIG) is an undirected graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for S = { all remaining variables } ◮ A CPDAG is a partially directed graph, where X i and X j are adjacent ⇔ X i � ⊥ ⊥ X j | S for all S ⊆ { all remaining variables } ◮ The skeleton of the CPDAG is a subgraph of the CIG ◮ The CIG can be obtained from the CPDAG by “moralization”: Marry unmarried parents and then make all edges undirected ◮ Example: X 2 X 2 X 2 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 3 X 3 CPDAG Marrying CIG

  22. Summary of problem definition ◮ We consider random variables ( X 1 , . . . , X p ) with distribution F 0 , where F 0 is multivariate Gaussian (or nonparanormal) ◮ We assume that F 0 has a perfect map G 0 ◮ Based on n i.i.d. observations from F 0 , we want to learn the CPDAG of G 0

  23. Three main approaches for structure learning ◮ Constraint-based: ◮ Conditional independencies in the data impose constraints on the CPDAG ◮ Example: PC-algorithm (Spirtes et al. ’93) ◮ Score-based: ◮ A score function is optimized over the space of DAGs/CPDAGs ◮ Example: greedy equivalence search (GES) (Chickering ’02) ◮ Hybrid: ◮ A score function is optimized over a restricted space of DAGs/CPDAGs, where the restricted space is determined using conditional independence constraints ◮ Examples: Max-Min Hill Climbing (MMHC) (Tsmardinos et al ’06), Restricted GES (RGES: GES restricted to estimated CIG)

Recommend


More recommend