beyond temperature scaling obtaining well calibrated
play

Beyond temperature scaling: Obtaining well-calibrated multiclass - PowerPoint PPT Presentation

Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kngsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019 Contributions New


  1. Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019

  2. Contributions ● New parametric calibration method: ● New regularization method for matrix scaling (and for Dirichlet calibration): ODIR – Off-Diagonal and Intercept Regularisation ● Multi-class classifier evaluation: Confidence-calibrated Confidence-reliability diagram Confidence-ECE Classwise-calibrated Classwise-reliability diagrams Classwise-ECE Multiclass-calibrated

  3. Making classifiers more trustworthy

  4. Making classifiers more trustworthy a classifier with 60% accuracy on a set of instances

  5. Making classifiers more trustworthy a classifier with 60% accuracy on a set of instances

  6. Making classifiers more trustworthy a classifier with 60% accuracy if the classifier reports class probabilities on a set of instances

  7. Making classifiers more trustworthy a classifier with 60% accuracy if the classifier reports class probabilities on a set of instances then we get instance-specific

  8. Trustworthy if confidence-calibrated

  9. Trustworthy if confidence-calibrated

  10. Trustworthy if confidence-calibrated

  11. Trustworthy if confidence-calibrated

  12. Trustworthy if confidence-calibrated

  13. Trustworthy if confidence-calibrated

  14. Trustworthy if confidence-calibrated

  15. Trustworthy if confidence-calibrated

  16. Trustworthy if confidence-calibrated Confidence-calibrated:

  17. Deep nets are usually over-confident Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

  18. Deep nets are usually over-confident Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

  19. Example: uncalibrated predictions Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58%

  20. Example: after calibration with temperature scaling Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88%

  21. Example: after calibration with temperature scaling Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% Classwise-calibrated: At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76%

  22. Example: after calibration with Dirichlet calibration Experimental setup: CIFAR-10 Confidence-calibrated: ResNet Wide 32 Accuracy: Overall: 94% Classwise-calibrated: At 90% confidence: 58% Accuracy after Temp.Scal: Overall: 94% At 90% confidence: 88% At 90% class 2 prob: 76% Accuracy after Dir.Calib: At 90% class 2 prob: 90%

  23. How to calibrate a multi-class classifier? logits class probabilities Last hidden layer Input layer ANY Softmax FEED- FORWARD NETWORK

  24. Temperature scaling frozen logits scaled logits class probabilities Temperature scaling Last hidden layer Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

  25. Vector scaling frozen logits scaled logits class probabilities Last hidden layer Vector scaling Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

  26. Matrix scaling frozen logits scaled logits class probabilities Last hidden layer Matrix scaling Input layer ANY Softmax FEED- FORWARD NETWORK Parameters: C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

  27. Dirichlet calibration can calibrate any classifiers class probabilities ANY PROBABILISTIC MULTI-CLASS CLASSIFIER

  28. Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Multi-class classification [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

  29. Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Derived from Dirichlet distribution Multi-class Dirichlet calibration classification (+ constrained variants) [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017

  30. Parametric calibration methods Logit space Class probability space Derived from Gaussian distribution Derived from Beta distribution Binary Platt scaling [1] Beta calibration [2] classification (+ constrained variants) Derived from Dirichlet distribution Multi-class Matrix scaling [3] Dirichlet calibration classification (+ vector scaling, temperature scaling) (+ constrained variants) [1] J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61–74, MIT Press, 2000. [2] M. Kull, T. Silva Filho, P. Flach. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. AISTATS 2017 [3] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On Calibration of Modern Neural Networks. ICML 2017

  31. Dirichlet calibration Fully-connected linear Log-transform Softmax ANY PROBABILISTIC MULTI-CLASS CLASSIFIER Parameters: Regularisation: • L2 • ODIR (Off-Diagonal and Intercept Regularisation)

  32. Non-neural experiments ● 21 datasets x 11 classifiers = 231 settings ● Average rank ○ Classwise-ECE ○ Log-loss ○ Error rate

  33. Which classifiers are calibrated?

  34. Which classifiers are calibrated?

  35. Deep Neural Networks Experiments: Settings - 3 datasets: CIFAR-10, CIFAR-100, SVHN - 11 convolutional NNs + 3 pretrained

  36. Neural experiments ● Datasets: CIFAR-10, CIFAR-100, SVHN ● 11 CNNs trained as in Guo et al + 3 pretrained Log-loss Classwise-ECE

  37. Conclusion 1. Dirichlet calibration: New parametric general-purpose multiclass calibration method a. Natural extension of two-class Beta calibration b. Easy to implement with multinomial logistic regression on log-transformed class probabilities 2. Best or tied best performance with 21 datasets x 11 classifiers 3. Advances state-of-the-art on Neural Networks by introducing ODIR regularisation

  38. Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach NeurIPS 2019

Recommend


More recommend