Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari*, Anshul Kundaje†, Avanti Shrikumar *† *co-first authors †co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student Assistant Professor Dept. of Computer Science Depts. of CS & Genetics

Label Shift Illustrated Train Model

Label Shift Illustrated Original model under-predicts

Label Shift Illustrated update

Label Shift Illustrated We don’t have How do we ground-truth update our ? labels for the classifier? new patients!

Main Contributions - An approach that achieves state-of-the-art on label shift adaptation - Scales to datasets with high-dimensional inputs - Does not require model retraining - Combines Max Likelihood with specific types of calibration. - Calibration with Temp. Scaling (TS) was insufficient (& sometimes harmful!) - Achieved state-of-the-art with extensions of TS (one of which we propose) that correct for systematic bias

Formal Definition of Label Shift Let: - 𝑧 denote our labels (whether or not person has disease) - 𝒚 denote the observed symptoms - 𝑞(𝒚, 𝑧) denote joint distribution (𝒚, 𝑧) at beginning of outbreak (“source domain”) - 𝑟(𝒚, 𝑧) denote joint distribution at widespread stage (“target domain”), when we don’t know labels - Goal: adapt source-domain classifier that predicts 𝑞(𝑧|𝒚) to instead predict 𝑟(𝑧|𝒚) for target domain Core assumption: disease has same symptoms irrespective of outbreak stage, i.e. 𝑞 𝒚 𝑧 = 𝑟(𝒚|𝑧) . - Thus, difference between source & target domain is exclusively caused by shift in label proportions 𝑞(𝑧) and 𝑟(𝑧) . Formally, 𝑟 𝒚, 𝑧 = 𝑞 𝒚|𝑧 𝑟 𝑧 - Also called prior probability shift (Amos, 2008), corresponds to “anti-causal learning” i.e. predicting cause 𝑧 from effects 𝒚 (Schloelkopf, 2012). - Anti-causal learning is appropriate here because diseases status 𝑧 cause the symptoms 𝒚 .

Estimating 𝑟 𝑧 𝒚 with Bayes’ Rule - Although 𝑞(𝒚|𝑧) is preserved, computing it is hard when 𝒚 is high-dimensional. - Much easier to estimate 𝑞(𝑧|𝒚) and 𝑞(𝑧) from the source domain, as 𝑧 is lower-dimensional. - If we know 𝑟(𝑧) , we can retrieve 𝑟 𝑧 𝑦 without ever estimating 𝑞 𝒚 𝑧 using Bayes’ Rule (first shown in Saerens et al., 2002): !(#,𝒚) !(𝒚|#)!(#) We first write 𝑟 𝑧 𝒚 = !(𝒚) = ∑ !∗ !(𝒚|# ∗ )!(# ∗ ) (terms in red are not explicitly known) )(𝒚|#)!(#) Substituting 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) (label shift assumption), we have 𝑟 𝑧 𝒚 = ∑ !∗ )(𝒚|# ∗ )!(# ∗ ) Through Bayes’ rule, observe that 𝑞 𝒚 𝑧 = )(#|𝒚))(𝒚) )(#) #(!|𝒚)#(𝒚) !(#) #(!) Substituting, we get 𝑟 𝑧 𝒚 = Reminders: #(!|𝒚)#(𝒚) ∑ ! !(#) - 𝒚 denotes features (e.g. symptoms) #(!) - 𝑧 denotes labels (e.g. disease status) #(!|𝒚) #(!) !(#) - 𝑞 indicates source-domain (labels known) 𝑞(𝑦) cancels out, giving 𝑟 𝑧 𝒚 = #(!|𝒚) - 𝑟 indicates target domain (labels unknown) ∑ ! #(!) !(#) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧)

Reminders: - 𝒚 denotes features (e.g. symptoms) - 𝑧 denotes labels (e.g. disease status) - 𝑞 indicates source-domain (labels known) - 𝑟 indicates target domain (labels unknown) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) - If we estimate 𝑞(𝑧|𝒚) , 𝑞(𝑧) from source data & are told 𝑟(𝑧) , we can find 𝑟(𝑧|𝒚) using Bayes’ rule

A Simple Iterative Approach to Label Shift… In practice, we are not told 𝑟(𝑧) – how can we estimate it? - Could use 𝑞(𝑧|𝒚) to predict on test set & average predictions to estimate 𝑟 𝑧 - Could then use 𝑟(𝑧) to update 𝑞(𝑧|𝒚) , and repeat the process until convergence! update Reminders: - 𝒚 denotes features (e.g. symptoms) - 𝑧 denotes labels (e.g. disease status) - 𝑞 indicates source-domain (labels known) - 𝑟 indicates target domain (labels unknown) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) - If we estimate 𝑞(𝑧|𝒚) , 𝑞(𝑧) from source data & are told 𝑟(𝑧) , we can find 𝑟(𝑧|𝒚) using Bayes’ rule

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari, Anshul Kundaje, Avanti Shrikumar *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Simultaneous maximum-likelihood calibration of robot and sensor parameters Andrea Censi, Luca

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

1 Being Normal, Simultaneously Maximizing Likelihood with Uniform Now have two equations, two

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

PARALLELIZATION OF MAXIMUM LIKELIHOOD MOTIVATION To analyze large amount of data using

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

ECON 626: Applied Microeconomics Lecture 11: Maximum Likelihood Estimation Professors: Pamela

Lecture 3. Inadmissibility of Maximum Likelihood Estimate and James-Stein Estimator Yuan Yao

Maximum Likelihood Setting parameters Chris Williams, School of Informatics We choose a

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari*, Anshul Kundaje, Avanti Shrikumar * *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Simultaneous maximum-likelihood calibration of robot and sensor parameters Andrea Censi, Luca

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

1 Being Normal, Simultaneously Maximizing Likelihood with Uniform Now have two equations, two

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

PARALLELIZATION OF MAXIMUM LIKELIHOOD MOTIVATION To analyze large amount of data using

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

ECON 626: Applied Microeconomics Lecture 11: Maximum Likelihood Estimation Professors: Pamela

Lecture 3. Inadmissibility of Maximum Likelihood Estimate and James-Stein Estimator Yuan Yao

Maximum Likelihood Setting parameters Chris Williams, School of Informatics We choose a

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari, Anshul Kundaje, Avanti Shrikumar *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student