DAGs with NO TEARS Continuous Optimization for Structure Learning Xun Zheng Bryon Aragam Pradeep Ravikumar Eric Xing Machine Learning Department Carnegie Mellon University November 28, 2018 Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 1 / 8
Background Graphical models: compact models of p ( x 1 , . . . , x d ) x 1 x 2 x 3 x 4 x 1 x 2 4 . 00 − 1 . 14 0 . 20 − 2 . 37 sample − 1 . 05 0 . 35 − 0 . 66 − 0 . 39 − − − − − → . . . . . . . . . . . . x 3 x 4 Structure learning: what graph fits the data best? x 1 x 2 x 3 x 4 x 1 x 2 4 . 00 − 1 . 14 0 . 20 − 2 . 37 estimate − 1 . 05 0 . 35 − 0 . 66 − 0 . 39 ? ← − − − − − . . . . . . . . . . . . x 3 x 4 Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 2 / 8
Structure Learning: Where Are We? MNs BNs Comments constraint-based need faithfulness � � score-based, local search combinatorial opt. � � Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8
Structure Learning: Where Are We? MNs BNs Comments constraint-based need faithfulness � � score-based, local search combinatorial opt. � � � † ? ∗ score-based, global search continuous opt. † Breakthough in Markov Networks: Huge success of methods like graphical lasso Widely applied in various fields, e.g . bioinformatics ∗ Challenges in Bayesian Networks: Directed graph → asymmetric matrix Acyclic graph → combinatorial constraint Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8
Structure Learning: Where Are We? MNs BNs Comments constraint-based need faithfulness � � score-based, local search combinatorial opt. � � � † score-based, global search this work continuous opt. † Breakthough in Markov Networks: Huge success of methods like graphical lasso Widely applied in various fields, e.g . bioinformatics ∗ Challenges in Bayesian Networks: Directed graph → asymmetric matrix Acyclic graph → combinatorial constraint Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 3 / 8
tl;dr max score ( G ) max score ( W ) G W ⇐ ⇒ s . t . G ∈ DAG s . t . h ( W ) ≤ 0 (combinatorial ) (smooth ) Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 4 / 8
tl;dr max score ( G ) max score ( W ) G W ⇐ ⇒ s . t . G ∈ DAG s . t . h ( W ) ≤ 0 (combinatorial ) (smooth ) Smooth Characterization of DAG Such function exists: h ( W ) = tr ( e W ◦ W ) − d . Moreover, simple gradient: ∇ h ( W ) = ( e W ◦ W ) T ◦ 2 W . Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 4 / 8
NO TEARS Available at: github.com/xunzheng/notears 30 lines (function, gradient) + 20 lines (optimize) ≈ 50 lines Existing algorithms: ≫ 1000 lines Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 5 / 8
Results: Recovering Erdos-Renyi Graph � FGS � W W +2 0 0 0 5 5 5 0 10 10 10 15 15 15 -2 0 5 10 15 0 5 10 15 0 5 10 15 ground truth this work baseline (FGS) Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 6 / 8
Results: Recovering Scale-free Graph � FGS � W W +2 0 0 0 5 5 5 0 10 10 10 15 15 15 -2 0 5 10 15 0 5 10 15 0 5 10 15 ground truth this work baseline (FGS) Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 7 / 8
Summary A smooth characterization of DAG: h ( W ) = tr ( e W ◦ W ) − d ≤ 0 ⇐ ⇒ G ( W ) ∈ DAG Use existing solvers for constrained optimization problem: max score ( W ) W s . t . h ( W ) ≤ 0 Bridge optimization and structure learning Xun Zheng (CMU) DAGs with NO TEARS November 28, 2018 8 / 8
Recommend
More recommend