learning discrete graphical models with neural networks
play

Learning Discrete Graphical Models with Neural Networks Andrey - PowerPoint PPT Presentation

Slide 1 Learning Discrete Graphical Models with Neural Networks Andrey Lokhov joint work with Abhijith Jayakumar, Sidhant Misra, Marc Vuffray UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energys NNSA


  1. Slide 1 Learning Discrete Graphical Models with Neural Networks Andrey Lokhov joint work with Abhijith Jayakumar, Sidhant Misra, Marc Vuffray UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  2. Slide 2 Graphical Models Probability distribution ! " has conditional dependency structure according to a given graph Factorization property Separation property " ( " # " # " ( " ' " % " $ " ' " % " $ " & " & ! " ∝ exp 1 : 2 (" 2 ) " ( |(" ' , " # ) is independent of (" % , " & , " $ ) 2∈2456789 UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  3. Slide 3 Graphical Model Learning Informally Unsupervised learning task Dimensions of the problem - Number of samples: % - Observe draws of random vectors ! - Number of variables: & - Learn structure and parameters of - Alphabet size: ' a positive distribution " ! > 0 (! ) ∈ 1, … , ' ) Prior work in computationally efficient learning Convex optimization based methods Mutual Information based greedy methods Vuffray, Misra, Lokhov ( 2016 , 2018 ) Bresler ( 2015 ) Klivans, Meka ( 2017 ) Hamilton, Koehler, Moitra ( 2017 ) Wu, Sanghavi, Dimakis ( 2019 ) UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  4. Slide 4 Setting of Graphical Model Learning The model has a parametric form: - Observe random draws of " - Recover parameters ∗ - ( (" ( ) ≤ 5 ! " ∝ exp ' + ( 2 + − + ∗ 2 (∈* Basis functions are centered: Prior ℓ 8 -bound on parameters: ∗ ≤ < ∗ + 9 8 = ' + ( = ' - ( " ( = 0, 0 ∈ 1 (∋9 > ? UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  5. Slide 5 Method for Solving the Inverse problem: GRISE Generalized Regularized Interaction Screening (GRISE) Arbitrary parametric form C 1 ∗ - ( (" ( ) 5 A ) + 0 = arg min @ ' exp − ' + ( - ( (" ( ! " ∝ exp ' + ( = > AB1 (∈* > (∈* s.t. + 0 1 ≤ 3 4 Local Reconstruction (one neighborhood at a time) Convex Function (with low complexity minimization using entropic descent) UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  6. Slide 6 Intuition Behind GRISE: Infinite Sample Size Limit ∗ + 1 0 1 〈 S i *(J i ,H i ) 〉 ∗ - ( (" ( ) ! " ∝ exp ' + ( 〈 S i *(J i =0,H i =0) 〉 =1 (∈* i i 2→4 0 1 ∗ + 1 = 6 exp 9 0 1 + 1 − ' + ( - ( " ( (∈* 8 (=) + 1 ∗ + 1 J i ∗ 0 1 〈 S i *(J i *,H i *) 〉 ∗ + 1 ∗ = 0 H i ∇ ; 8 0 1 (<) i + 1 UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  7. Slide 7 Theorem for Learning Gibbs Distributions with GRISE (Informal) With high probability, GRISE estimates : ≤ & ! " − " ∗ 2 with a number of samples : + , -. log 2 /& 4 ( = * and computational complexity : * + 2 . Precise finite sample analysis with proofs: arXiv:1902.00600 UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  8. Slide 8 Complete Basis Function Hierarchies: Monomial Basis Example # < # % ∗ # % # % + ∗ # % # ∗ # % + - # ∝ exp 2 4 % 2 4 %' 2 4 %'" ' # " + ⋯ %∈3 (%,')∈8 9 (%,',")∈8 : # = # " # ' Binary alphabet # ∈ −1, +1 , ! " # " ∈ # % , # % # ' , # % # ' # " , … Monomial basis functions UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  9. Slide 9 Complete Basis Function Hierarchies: Monomial Basis Example " 7 " ( ∗ " ( " ( + ∗ " ( " ∗ " ( + ! " ∝ exp ' + ( ' + (0 ' + (04 0 " 4 + ⋯ (∈* ((,0)∈2 3 ((,0,4)∈2 5 " 8 " 4 " 0 Interaction Screening Loss: M 1 > + ( = arg min I ' exp −" ( + ( + ' + (0 " ( + ' + (04 " 0 " 4 + ⋯ F G JKL 0 0,4 ; < = . For 9 -wise models, the computational complexity of GRISE is : UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  10. Slide 10 Neural Net Parametrization of the Partial Energy Function Interaction Screening Loss: 3 1 > ? # = arg min . / exp −8 # ? # + / ? #B 8 # + / ? #BD 8 B 8 D + ⋯ @ , 012 B B,D Neural Net Interaction Screening Loss: 3 1 " # = arg min ! . / exp −8 # ΝΝ(8\8 # ; " # ) + , 012 If Neural Net is expressive enough, the global minima of NN-GRISE loss are interaction screening minima corresponding to recovered local energy UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  11. Slide 11 Illustration on a small ( ! = #$ ) tractable model of order % = & NN-GRISE hierarchy contains higher-order polynomials in its hypothesis space NN-GRISE explores a different basis functions hierarchy, and gets close to the true model with less parameters UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  12. Slide 12 Comparison of conditional distributions for a larger problem For p=15, L=6 problem, monomial basis contains 3472 terms, and GRISE becomes intractable Only order L=4 is practically feasible with GRISE NN basis has less parameters (349) and uses less training samples UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  13. Slide 13 Structure Learning with NN-GRISE 3 1 (2) " # = arg min ! . / exp −8 # ΝΝ(8\8 # ; " # ) + ? " # 2 + , 012 Regularization through penalty on first layer weights Variables @ outside of the neighborhood of A do not influence the output at the interaction screening minima UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  14. Slide 14 Summary - GRISE is a convex estimator for learning arbitrary discrete graphical models with rigorous guarantees, improving upon sampling complexities of previous methods Efficient Learning of Discrete Graphical Models M. Vuffray, S. Misra, A. Y. Lokhov (2020) - NN-GRISE is a computationally efficient non-convex estimator that uses the non-linear representation power of Neural Nets to exploit sparse basis hierarchies - NN-GRISE can still learn the MRF structure, full energy function representation, and conditional distributions that can be used for re-sampling from the learned model Learning of Discrete Graphical Models with Neural Networks Abhijith J., A. Y. Lokhov, S. Misra, M. Vuffray (2020) UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

  15. Slide 15 Questions? UNCLASSIFIED Managed by Triad National Security, LLC for the U.S. Department of Energy’s NNSA

Recommend


More recommend