Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams Laura I. Galindez Olascoaga ✻ , Wannes Meert ✻ , Nimish Shah ✻ , Guy Van den Broeck ✣ , Marian Verhelst ✻ ✣ ✻
Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 2 IDA2020
Motivation Probabilistic inference has Probabilistic circuits Some of these models’ proven to be well suited for successfully balance robustness (from resource-constrained efficiency vs. expressiveness generative learning) is at embedded applications . trade-offs while remaining odds with discriminative robust. performance. (Galindez et al. 2019) 3 IDA2020
Objective Keep robustness provided by generative learning strategies. Improve discriminative performance by exploiting knowledge encoding capabilities. 4 IDA2020
Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 5 IDA2020
Background: probabilistic inference Given a probabilistic model m of the world Answer probabilistic queries q 1 ( m )=Pr m ( ) Evidence Conditional q 2 ( m )=Pr m ( | ) q 3 ( m )=Argmax time Pr m ( ) MAP IDA2020
Background: tractable probabilistic inference A query q( m ) is tractable iff exactly computing it runs in time O(poly(|m|). There is an inherent trade-off between tractability and expressiveness (From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck) 7 IDA2020
Background: probabilistic circuits A probabilistic circuit is a computational graph that encodes a probability distribution p(X). (From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck) 8 IDA2020
Background: what is a PSDD? PSDDs are probabilistic extensions to SDDs, which represent Boolean functions as logical circuits (Kisa et al., 2014). Bayesian Network PSDD 1 Pr = 0.2 0.2 0.8 ) = (0.1 𝑗𝑔 2 2 Pr 0.7 𝑗𝑔 1.0 0.1 0.9 ) = (1 𝑗𝑔 ∧ Pr 0 𝑗𝑔 otherwise 0.7: (Example from Liang et al., 2017) 9 IDA2020
Background: PSDDs’ properties Decision node Vtree The left variable of the AND 1 gate is the prime ( p ) 1 𝜄 ! = 0.2 𝜄 " = 0.8 and the right is the sub ( s ). … 2 Edges of decision nodes are … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " annotated with a normalized probability distribution. 11 IDA2020
Background: PSDDs’ properties Syntactic restrictions: See (Kisa et al., 2014). 1) Decomposability : inputs of PSDD Vtree AND node must be disjoint. 1 1 For example at 1: 0.2 0.8 Prime variables 𝒀 = {𝑆𝑏𝑗𝑜} Sub variables 𝒁 = {𝑇𝑣𝑜, 𝑆𝑐𝑝𝑥} 2 2 2 1.0 2) Determinism : only one of 0.1 0.9 the decision node’s inputs can be true. 0.7: 12 IDA2020
Background: PSDDs’ properties Decision nodes q encode the distribution: Decision node 𝑄𝑠 ' 𝒀𝒁 = ; 𝜄 $ 𝑄𝑠 %$ (𝒀)𝑄𝑠 &$ (𝒁) 1 $ 𝜄 ! = 0.2 𝜄 " = 0.8 𝑄𝑠 # 𝒀𝒁|[𝑞 $ ] = 𝑄𝑠 % ! (𝒀|[𝑞 $ ])𝑄𝑠 & ! (𝒁|[𝑞 $ ]) … … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " = 𝑄𝑠 % ! (𝒀)𝑄𝑠 & ! (𝒁) A logical sentence that defines the support of node distribution 13 IDA2020
Background: PSDDs’ properties Decision nodes q encode the distribution: Decision node 𝑄𝑠 ' 𝒀𝒁 = ; 𝜄 $ 𝑄𝑠 %$ (𝒀)𝑄𝑠 &$ (𝒁) 1 $ 𝜄 ! = 0.2 𝜄 " = 0.8 𝑄𝑠 ! 𝒀𝒁 = 0.2 ⋅ 𝑄𝑠 # ! 𝒀 𝑄𝑠 $ ! 𝒁 + … … 𝑞 ! 𝑡 ! 𝑞 " 𝑡 " 0.8 ⋅ 𝑄𝑠 # " 𝒀 𝑄𝑠 $ " 𝒁 For example at 1: 0.8 ⋅ 𝑄𝑠 # ! 𝒀|[ ] 𝑄𝑠 $ ! 𝒁|[ ] Prime variables ! = {$%&'} 0.2 ⋅ 𝑄𝑠 # ! 𝒀|[ ] 𝑄𝑠 $ ! 𝒁|[ ] Sub variables ) = {*+', $-./} 14 IDA2020
Background: learning PSDDs The LearnPSDD algorithm (Liang et al., 2017) learns the PSDD structure incrementally from data. Learn vtree from data Iteratively apply split (Minimize mutual and clone operations information) 1 Generate candidate 1 operations 1 0.2 0.8 1.0 2 Calculate log-llk … … … improvement 2 2 2 Execute 0.1 0.9 1.0 best 1.0 operation 0.7: 15 IDA2020
Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 16 IDA2020
Classification with PSDDs Given a feature variable set 𝑮 and a class variable 𝐷. The classification task can be stated as a probabilistic query: Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) LearnPSDD remains agnostic With LearnPSDD to the classification task features might never be conditioned on the class 17 IDA2020
Bayesian Network classifiers Effects of explicitly conditioning 𝑮 on 𝐷. Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷) ! With LearnPSDD With Bayesian features might Network classifiers " " " " ! " # $ never be features are always ! conditioned on conditioned on the the class. class. " " " " ! " # $ 18 IDA2020
Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . Minimize conditional mutual information 20 IDA2020
Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . Initializing on a Minimize conditional fully factorized mutual information distribution 21 IDA2020
Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o However, only setting the vtree is not enough. 𝑮 still independent from 𝐷 22 IDA2020
Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o Set 23 IDA2020
Enforcing the discriminative bias: D-LearnPSDD Make sure that feature variables 𝑮 can be conditioned on the class variable 𝐷 . o Set o LearnPSDD ensures that the base of the root node remains unchanged. Encodes a naive Bayes structure 24 IDA2020
Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 25 IDA2020
Experimental results - 15 UCI datasets - 5-fold cross validation - Average accuracy over a range of model size - Model size is number of parameters 26 IDA2020
Experimental results 27 IDA2020
Experimental results D-LearnPSDD remains robust against missing features. 28 IDA2020
Outline Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions 29 IDA2020
Conclusions We introduced a PSDD learning technique that improves classification performance by introducing a discriminative bias. Robustness is ensured by exploiting the generative learning strategy. The proposed technique outperforms purely generative PSDDs in terms of classification accuracy and the other baseline classifiers in terms of robustness. 30 IDA2020
References Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst and Guy Van den Broeck. Towards Hardware-Aware Tractable Learning of Probabilistic Models, In Advances in Neural Information Processing Systems 32 (NeurIPS) , 2019. YooJung Choi, Antonio Vergari, Robert Peharz and Guy Van den Broeck. Probabilistic Circuits: Representation and Inference, AAAI tutorial, 2020. Yitao Liang, Jessa Bekker and Guy Van den Broeck. Learning the Structure of Probabilistic Sentential Decision Diagrams, In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) , 2017. Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche. Probabilistic sentential decision diagrams, In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR) , 2014. Thank you! Contact: laura.galindez@esat.kuleuven.be 31 IDA2020
Recommend
More recommend