discriminative bias for learning probabilistic sentential
play

Discriminative Bias for Learning Probabilistic Sentential Decision - PowerPoint PPT Presentation

Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams Laura I. Galindez Olascoaga , Wannes Meert , Nimish Shah , Guy Van den Broeck , Marian Verhelst Outline Motivation and objective Background


  1. Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams Laura I. Galindez Olascoaga ✻ , Wannes Meert ✻ , Nimish Shah ✻ , Guy Van den Broeck ✣ , Marian Verhelst ✻ ✣ ✻

  2. Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 2 IDA2020

  3. Motivation Probabilistic inference has Probabilistic circuits Some of these models’ proven to be well suited for successfully balance robustness (from resource-constrained efficiency vs. expressiveness generative learning) is at embedded applications . trade-offs while remaining odds with discriminative robust. performance. (Galindez et al. 2019) 3 IDA2020

  4. Objective Keep robustness provided by generative learning strategies. Improve discriminative performance by exploiting knowledge encoding capabilities. 4 IDA2020

  5. Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 5 IDA2020

  6. Background: probabilistic inference Given a probabilistic model m of the world Answer probabilistic queries q 1 ( m )=Pr m ( ) Evidence Conditional q 2 ( m )=Pr m ( | ) q 3 ( m )=Argmax time Pr m ( ) MAP IDA2020

  7. Background: tractable probabilistic inference A query q( m ) is tractable iff exactly computing it runs in time O(poly(|m|). There is an inherent trade-off between tractability and expressiveness (From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck) 7 IDA2020

  8. Background: probabilistic circuits A probabilistic circuit is a computational graph that encodes a probability distribution p(X). (From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck) 8 IDA2020

  9. Background: what is a PSDD? PSDDs are probabilistic extensions to SDDs, which represent Boolean functions as logical circuits (Kisa et al., 2014). Bayesian Network PSDD 1 Pr = 0.2 0.2 0.8 ) = (0.1 𝑗𝑔 2 2 Pr 0.7 𝑗𝑔 1.0 0.1 0.9 ) = (1 𝑗𝑔 ∧ Pr 0 𝑗𝑔 otherwise 0.7: (Example from Liang et al., 2017) 9 IDA2020

  10. Background: PSDDs’ properties Decision node Vtree The left variable of the AND 1 gate is the prime ( p ) 1 𝜄 ! = 0.2 𝜄 " = 0.8 and the right is the sub ( s ). … 2 Edges of decision nodes are … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " annotated with a normalized probability distribution. 11 IDA2020

  11. Background: PSDDs’ properties Syntactic restrictions: See (Kisa et al., 2014). 1) Decomposability : inputs of PSDD Vtree AND node must be disjoint. 1 1 For example at 1: 0.2 0.8 Prime variables 𝒀 = {𝑆𝑏𝑗𝑜} Sub variables 𝒁 = {𝑇𝑣𝑜, 𝑆𝑐𝑝𝑥} 2 2 2 1.0 2) Determinism : only one of 0.1 0.9 the decision node’s inputs can be true. 0.7: 12 IDA2020

  12. Background: PSDDs’ properties Decision nodes q encode the distribution: Decision node 𝑄𝑠 ' 𝒀𝒁 = ; 𝜄 $ 𝑄𝑠 %$ (𝒀)𝑄𝑠 &$ (𝒁) 1 $ 𝜄 ! = 0.2 𝜄 " = 0.8 𝑄𝑠 # 𝒀𝒁|[𝑞 $ ] = 𝑄𝑠 % ! (𝒀|[𝑞 $ ])𝑄𝑠 & ! (𝒁|[𝑞 $ ]) … … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " = 𝑄𝑠 % ! (𝒀)𝑄𝑠 & ! (𝒁) A logical sentence that defines the support of node distribution 13 IDA2020

  13. Background: PSDDs’ properties Decision nodes q encode the distribution: Decision node 𝑄𝑠 ' 𝒀𝒁 = ; 𝜄 $ 𝑄𝑠 %$ (𝒀)𝑄𝑠 &$ (𝒁) 1 $ 𝜄 ! = 0.2 𝜄 " = 0.8 𝑄𝑠 ! 𝒀𝒁 = 0.2 ⋅ 𝑄𝑠 # ! 𝒀 𝑄𝑠 $ ! 𝒁 + … … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " 0.8 ⋅ 𝑄𝑠 # " 𝒀 𝑄𝑠 $ " 𝒁 For example at 1: 0.8 ⋅ 𝑄𝑠 # ! 𝒀|[ ] 𝑄𝑠 $ ! 𝒁|[ ] Prime variables ! = {$%&'} 0.2 ⋅ 𝑄𝑠 # ! 𝒀|[ ] 𝑄𝑠 $ ! 𝒁|[ ] Sub variables ) = {*+', $-./} 14 IDA2020

  14. Background: learning PSDDs The LearnPSDD algorithm (Liang et al., 2017) learns the PSDD structure incrementally from data. Learn vtree from data Iteratively apply split (Minimize mutual and clone operations information) 1 Generate candidate 1 operations 1 0.2 0.8 1.0 2 Calculate log-llk … … … improvement 2 2 2 Execute 0.1 0.9 1.0 best 1.0 operation 0.7: 15 IDA2020

  15. Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 16 IDA2020

  16. Classification with PSDDs Given a feature variable set 𝑮 and a class variable 𝐷. The classification task can be stated as a probabilistic query: Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) LearnPSDD remains agnostic With LearnPSDD to the classification task features might never be conditioned on the class 17 IDA2020

  17. Bayesian Network classifiers Effects of explicitly conditioning 𝑮 on 𝐷. Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) ! With LearnPSDD With Bayesian features might Network classifiers " " " " ! " # $ never be features are always ! conditioned on conditioned on the the class. class. " " " " ! " # $ 18 IDA2020

  18. Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . Minimize conditional mutual information 20 IDA2020

  19. Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . Initializing on a Minimize conditional fully factorized mutual information distribution 21 IDA2020

  20. Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o However, only setting the vtree is not enough. 𝑮 still independent from 𝐷 22 IDA2020

  21. Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o Set 23 IDA2020

  22. Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o Set o LearnPSDD ensures that the base of the root node remains unchanged. Encodes a naive Bayes structure 24 IDA2020

  23. Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 25 IDA2020

  24. Experimental results - 15 UCI datasets - 5-fold cross validation - Average accuracy over a range of model size - Model size is number of parameters 26 IDA2020

  25. Experimental results 27 IDA2020

  26. Experimental results D-LearnPSDD remains robust against missing features. 28 IDA2020

  27. Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 29 IDA2020

  28. Conclusions We introduced a PSDD learning technique that improves classification performance by introducing a discriminative bias. Robustness is ensured by exploiting the generative learning strategy. The proposed technique outperforms purely generative PSDDs in terms of classification accuracy and the other baseline classifiers in terms of robustness. 30 IDA2020

  29. References Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst and Guy Van den Broeck. Towards Hardware-Aware Tractable Learning of Probabilistic Models, In Advances in Neural Information Processing Systems 32 (NeurIPS) , 2019. YooJung Choi, Antonio Vergari, Robert Peharz and Guy Van den Broeck. Probabilistic Circuits: Representation and Inference, AAAI tutorial, 2020. Yitao Liang, Jessa Bekker and Guy Van den Broeck. Learning the Structure of Probabilistic Sentential Decision Diagrams, In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) , 2017. Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche. Probabilistic sentential decision diagrams, In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR) , 2014. Thank you! Contact: laura.galindez@esat.kuleuven.be 31 IDA2020

Recommend


More recommend