on robust trimming of bayesian network classifiers
play

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and - PowerPoint PPT Presentation

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA Bayesian Network Classifiers Class Latent Test 1 Test 2 Test 3 Test 4 Features Bayesian Network Classifiers Class Latent Test 1 Test 2 Test


  1. On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

  2. Bayesian Network Classifiers Class Latent Test 1 Test 2 Test 3 Test 4 Features

  3. Bayesian Network Classifiers Class Latent Test 1 Test 2 Test 3 Test 4 Features

  4. Bayesian Network Classifiers Class Latent Can we make the same classifications with fewer features? Test 1 Test 2 Test 3 Test 4 Features

  5. Why Classification Similarity? To preserve classification behavior on individual examples • Fairness • Deployed classifiers

  6. How to measure Similarity? “Expected Classification Agreement” What is the expected probability that a classifier α will agree with its trimming β ?

  7. Robust Trimming Trimmed classifier Original classifier Similarity

  8. Trimming Algorithm Feature subset selection “Maximum Achievable Agreement” Search Objective function

  9. Trimming Algorithm • Branch-and-Bound search

  10. Trimming Algorithm • Branch-and-Bound search • Need a bound for MAA to prune subtrees

  11. Upper-bound for MAA “Maximum Potential Agreement” Maximum agreement between α and a hypothetical function that maps f’ to c

  12. Maximum Potential Agreement 1. Upper-bounds the MAA Great for pruning! 2. Monotonically increasing

  13. Maximum Potential Agreement 1. Upper-bounds the MAA Great for pruning! 2. Monotonically increasing 3. Generally easier to compute than MAA 4. Equal to MAA given some independence condition (e.g. Naïve Bayes)

  14. Computing the MPA and MAA Prior works based on knowledge compilation D 𝐸 Pr 𝑆 1 = + 𝐸) Pr(𝐸 = +) ∙∙∙ + 0.7 0.2 − 0.2 R2 R1 AC 𝑄 1 ⟺ ¬𝐸⋀¬𝑆 1 𝑄 2 ⟺ ¬𝐸⋀¬𝑆 1 𝑄 3 ⟺ ¬𝐸⋀¬𝑆 1 𝑄 4 ⟺ ¬𝐸⋀¬𝑆 1 𝑥 𝑄 1 = 0.7 𝑥(𝑄 2 ) = 0.3 𝑥 𝑄 3 = 0.2 𝑥(𝑄 4 ) = 0.8 𝑥 𝑚 = 1.0 for all other literal 𝑚 [Oztok,Choi,Darwiche 2016; C,Darwiche,VdB 2017]

  15. Evaluation

  16. Evaluation Branch-and-bound improves efficiency (even with extra upper-bound computations)

  17. Evaluation High information gain does not lead to high classification agreement Information-theoretic measures unaware of changes in classification threshold

  18. Thank you! Questions?

Recommend


More recommend