Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative • Please start HW 1 early! • Questions are welcome!

Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose 𝜄 that maximizes probability of observed data 𝜾 MLE = argmax ෡ 𝑄(𝐸𝑏𝑢𝑏|𝜄) 𝜄 • Maximum a posteriori estimation (MAP) Choose 𝜄 that is most probable given prior probability and data 𝑄 𝐸𝑏𝑢𝑏 𝜄 𝑄 𝜄 𝜾 MAP = argmax ෡ 𝑄 𝜄 𝐸 = argmax 𝑄(𝐸𝑏𝑢𝑏) 𝜄 𝜄 Slide credit: Tom Mitchell

Naïve Bayes classifier • Want to learn 𝑄 𝑍 𝑌 1 , ⋯ , 𝑌 𝑜 ) • But require 𝟑 𝒐 parameters... • How about applying Bayes rule? 𝑄(𝑌 1 ,⋯,𝑌 𝑜 𝑍 𝑄 𝑍 • 𝑄 𝑍 𝑌 1 , ⋯ , 𝑌 𝑜 ) = ∝ 𝑄(𝑌 1 , ⋯ , 𝑌 𝑜 𝑍 𝑄 𝑍 𝑄(𝑌 1 ,⋯,𝑌 𝑜 ) • 𝑄(𝑌 1 , ⋯ , 𝑌 𝑜 𝑍 : Need (𝟑 𝒐 −𝟐) × 𝟑 parameters • 𝑄(𝑍) : Need 1 parameter • Apply conditional independence assumption 𝑜 • 𝑄 𝑌 1 , ⋯ , 𝑌 𝑜 𝑍 = ς 𝑘=1 𝑄(𝑌 𝑘 |𝑍) : Need 𝐨 × 𝟑 parameters

Naïve Bayes classifier • Bayes rule: 𝑄(𝑍 = 𝑧 𝑙 )𝑄(𝑌 1 , ⋯ , 𝑌 𝑜 𝑍 = 𝑧 𝑙 𝑄 𝑍 = 𝑧 𝑙 𝑌 1 , ⋯ , 𝑌 𝑜 ) = σ 𝑘 𝑄 𝑍 = 𝑧 𝑘 𝑄 𝑌 1 , ⋯ , 𝑌 𝑜 𝑍 = 𝑧 𝑘 • Assume conditional independence among 𝑌 𝑗 ’s: 𝑄 𝑍 = 𝑧 𝑙 𝑌 1 , ⋯ , 𝑌 𝑜 ) = 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 ) σ 𝑘 𝑄 𝑍 = 𝑧 𝑘 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑘 ) • Pick the most probable Y ෠ 𝑍 ← argmax 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 ) 𝑧 𝑙 Slide credit: Tom Mitchell

Example • 𝑄 𝑍 𝑌 1 , 𝑌 2 ∝ 𝑄 𝑍 𝑄 𝑌 1 , 𝑌 2 𝑍 = 𝑄 𝑍 𝑄 𝑌 1 𝑍 𝑄(𝑌 2 𝑍 Bayes rule Conditional indep. • Estimating parameters 𝑄 𝑍 = 1 = 0.4 𝑄 𝑍 = 0 = 0.6 𝑄 𝑌 1 = 1|𝑍 = 1 = 0.2 𝑄 𝑌 1 = 0|𝑍 = 1 = 0.8 𝑄 𝑌 1 = 1|𝑍 = 0 = 0.7 𝑄 𝑌 1 = 0|𝑍 = 0 = 0.3 𝑄 𝑌 2 = 1|𝑍 = 1 = 0.3 𝑄 𝑌 2 = 0|𝑍 = 1 = 0.7 𝑄 𝑌 2 = 1|𝑍 = 0 = 0.9 𝑄 𝑌 2 = 0|𝑍 = 0 = 0.1 • Test example: 𝑌 1 = 1, 𝑌 2 = 0 • 𝑍 = 1 : 𝑄 𝑍 = 1 𝑄 𝑌 1 = 1|𝑍 = 1 𝑄 𝑌 2 = 0|𝑍 = 1 = 0.4 × 0.2 × 0.7 = 0.056 • 𝑍 = 0 : 𝑄 𝑍 = 0 𝑄 𝑌 1 = 1|𝑍 = 0 𝑄 𝑌 2 = 0|𝑍 = 0 = 0.6 × 0.7 × 0.1 = 0.042

Naïve Bayes algorithm – discrete X i • For each value y k Estimate 𝜌 𝑙 = 𝑄(𝑍 = 𝑧 𝑙 ) For each value x ij of each attribute X i Estimate 𝜄 𝑗𝑘𝑙 = 𝑄(𝑌 𝑗 = 𝑦 𝑗𝑘𝑙 |𝑍 = 𝑧 𝑙 ) • Classify X test test 𝑍 = 𝑧 𝑙 ) ෠ 𝑍 ← argmax 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑧 𝑙 ෠ 𝑍 ← argmax 𝜌 𝑙 Π 𝑗 𝜄 𝑗𝑘𝑙 𝑧 𝑙 Slide credit: Tom Mitchell

Estimating parameters: discrete 𝑍, 𝑌 𝑗 • Maximum likelihood estimates (MLE) 𝑄 𝑍 = 𝑧 𝑙 = #𝐸 𝑍 = 𝑧 𝑙 𝜌 𝑙 = ෠ ො 𝐸 𝑄 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑍 = 𝑧 𝑙 = #𝐸 𝑌 𝑗 = 𝑦 𝑗𝑘 ^ 𝑍 = 𝑧 𝑙 መ 𝜄 𝑗𝑘𝑙 = ෠ #𝐸{𝑍 = 𝑧 𝑙 } Slide credit: Tom Mitchell

• F = 1 iff you live in Fox Ridge • S = 1 iff you watched the superbowl last night • D = 1 iff you drive to VT • G = 1 iff you went to gym in the last month 𝑄 𝐺 = 1 = 𝑄 𝐺 = 0 = 𝑄 𝑇 = 1|𝐺 = 1 = 𝑄 𝑇 = 0|𝐺 = 1 = 𝑄 𝑇 = 1|𝐺 = 0 = 𝑄 𝑇 = 0|𝐺 = 0 = 𝑄 𝐸 = 1|𝐺 = 1 = 𝑄 𝐸 = 0|𝐺 = 1 = 𝑄 𝐸 = 1|𝐺 = 0 = 𝑄 𝐸 = 0|𝐺 = 0 = 𝑄 𝐻 = 1|𝐺 = 1 = 𝑄 𝐻 = 0|𝐺 = 1 = 𝑄 𝐻 = 1|𝐺 = 0 = 𝑄 𝐻 = 0|𝐺 = 0 = 𝑄 𝐺|𝑇, 𝐸, 𝐻 = 𝑄 𝐺 P S F P D F P(G|F)

Naïve Bayes: Subtlety #1 • Often the 𝑌 𝑗 are not really conditionally independent • Naïve Bayes often works pretty well anyway • Often the right classification, even when not the right probability [Domingos & Pazzani, 1996]) • What is the effect on estimated P(Y|X) ? • What if we have two copies: 𝑌 𝑗 = 𝑌 𝑙 𝑄 𝑍 = 𝑧 𝑙 𝑌 1 , ⋯ , 𝑌 𝑜 ) ∝ 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 ) Slide credit: Tom Mitchell

Naïve Bayes: Subtlety #2 MLE estimate for 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 ) might be zero. (for example, 𝑌 𝑗 = birthdate. 𝑌 𝑗 = Feb_4_1995) • Why worry about just one parameter out of many? 𝑄 𝑍 = 𝑧 𝑙 𝑌 1 , ⋯ , 𝑌 𝑜 ) ∝ 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 ) • What can we do to address this? • M AP estimates (adding “imaginary” examples) Slide credit: Tom Mitchell

Estimating parameters: discrete 𝑍, 𝑌 𝑗 • Maximum likelihood estimates (MLE) 𝑄 𝑍 = 𝑧 𝑙 = #𝐸 𝑍 = 𝑧 𝑙 𝜌 𝑙 = ෠ ො 𝐸 𝑄 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑍 = 𝑧 𝑙 = #𝐸 𝑌 𝑗 = 𝑦 𝑗𝑘 , 𝑍 = 𝑧 𝑙 𝜄 𝑗𝑘𝑙 = ෠ መ #𝐸{𝑍 = 𝑧 𝑙 } • MAP estimates (Dirichlet priors): 𝑄 𝑍 = 𝑧 𝑙 = #𝐸 𝑍 = 𝑧 𝑙 + (𝛾 𝑙 −1) 𝜌 𝑙 = ෠ ො 𝐸 + σ 𝑛 (𝛾 𝑛 −1) 𝑄 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑍 = 𝑧 𝑙 = #𝐸 𝑌 𝑗 = 𝑦 𝑗𝑘 , 𝑍 = 𝑧 𝑙 + (𝛾 𝑙 −1) 𝜄 𝑗𝑘𝑙 = ෠ መ #𝐸{𝑍 = 𝑧 𝑙 } + σ 𝑛 (𝛾 𝑛 −1) Slide credit: Tom Mitchell

What if we have continuous X i • Gaussian Naïve Bayes (GNB): assume 2 1 exp(− 𝑦 − 𝜈 𝑗𝑙 𝑄 𝑌 𝑗 = 𝑦 𝑍 = 𝑧 𝑙 = ) 2 2𝜏 𝑗𝑙 2𝜌𝜏 𝑗𝑙 • Additional assumption on 𝜏 𝑗𝑙 : • Is independent of 𝑍 ( 𝜏 𝑗 ) • Is independent of 𝑌 𝑗 ( 𝜏 𝑙 ) • Is independent of 𝑌 i and 𝑍 ( 𝜏 ) Slide credit: Tom Mitchell

Naïve Bayes algorithm – continuous X i • For each value y k Estimate 𝜌 𝑙 = 𝑄(𝑍 = 𝑧 𝑙 ) For each attribute X i estimate Class conditional mean 𝜈 𝑗𝑙 , variance 𝜏 𝑗𝑙 • Classify X test test 𝑍 = 𝑧 𝑙 ) ෠ 𝑍 ← argmax 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑧 𝑙 ෠ test , 𝜈 𝑗𝑙 , 𝜏 𝑗𝑙 ) 𝑍 ← argmax 𝜌 𝑙 Π 𝑗 𝑂𝑝𝑠𝑛𝑏𝑚(𝑌 𝑗 𝑧 𝑙 Slide credit: Tom Mitchell

Things to remember • Probability basics • Conditional probability, joint probability, Bayes rule • Estimating parameters from data • Maximum likelihood (ML) maximize 𝑄(Data|𝜄) • Maximum a posteriori estimation (MAP) maximize 𝑄(𝜄|Data) • Naive Bayes 𝑄 𝑍 = 𝑧 𝑙 𝑌 1 , ⋯ , 𝑌 𝑜 ) ∝ 𝑄 𝑍 = 𝑧 𝑙 Π 𝑗 𝑄 𝑌 𝑗 𝑍 = 𝑧 𝑙 )

Logistic Regression • Hypothesis representation • Cost function • Logistic regression with gradient descent • Regularization • Multi-class classification

1 (Yes) Malignant? 0 (No) Tumor Size ℎ 𝜄 𝑦 = 𝜄 ⊤ 𝑦 • Threshold classifier output ℎ 𝜄 𝑦 at 0.5 • If ℎ 𝜄 𝑦 ≥ 0.5, predict “ 𝑧 = 1 ” • If ℎ 𝜄 𝑦 < 0.5 , predict “ 𝑧 = 0 ” Slide credit: Andrew Ng

Classification: 𝑧 = 1 or 𝑧 = 0 ℎ 𝜄 𝑦 = 𝜄 ⊤ 𝑦 (from linear regression) can be > 1 or < 0 Logistic regression: 0 ≤ ℎ 𝜄 𝑦 ≤ 1 Logistic regression is actually for classification Slide credit: Andrew Ng

Hypothesis representation • Want 0 ≤ ℎ 𝜄 𝑦 ≤ 1 1 ℎ 𝜄 𝑦 = 1 + 𝑓 −𝜄 ⊤ 𝑦 • ℎ 𝜄 𝑦 = 𝑕 𝜄 ⊤ 𝑦 , 1 where 𝑕 𝑨 = 1+𝑓 −𝑨 𝑕(𝑨) • Sigmoid function • Logistic function 𝑨 Slide credit: Andrew Ng

Interpretation of hypothesis output • ℎ 𝜄 𝑦 = estimated probability that 𝑧 = 1 on input 𝑦 • Example: If 𝑦 = 𝑦 0 1 x 1 = tumorSize • ℎ 𝜄 𝑦 = 0.7 • Tell patient that 70% chance of tumor being malignant Slide credit: Andrew Ng

Logistic regression 𝑕(𝑨) ℎ 𝜄 𝑦 = 𝑕 𝜄 ⊤ 𝑦 1 𝑕 𝑨 = 1 + 𝑓 −𝑨 𝑨 = 𝜄 ⊤ 𝑦 Suppose predict “y = 1” if ℎ 𝜄 𝑦 ≥ 0.5 𝑨 = 𝜄 ⊤ 𝑦 ≥ 0 predict “y = 0” if ℎ 𝜄 𝑦 < 0.5 𝑨 = 𝜄 ⊤ 𝑦 < 0 Slide credit: Andrew Ng

Decision boundary • ℎ 𝜄 𝑦 = 𝑕(𝜄 0 + 𝜄 1 𝑦 1 + 𝜄 2 𝑦 2 ) Age E.g., 𝜄 0 = −3, 𝜄 1 = 1, 𝜄 2 = 1 Tumor Size • Predict “ 𝑧 = 1 ” if −3 + 𝑦 1 + 𝑦 2 ≥ 0 Slide credit: Andrew Ng

• ℎ 𝜄 𝑦 = 𝑕(𝜄 0 + 𝜄 1 𝑦 1 + 𝜄 2 𝑦 2 2 + 𝜄 4 𝑦 2 2 ) + 𝜄 3 𝑦 1 E.g., 𝜄 0 = −1, 𝜄 1 = 0, 𝜄 2 = 0, 𝜄 3 = 1, 𝜄 4 = 1 2 ≥ 0 2 + 𝑦 2 • Predict “ 𝑧 = 1 ” if −1 + 𝑦 1 2 + • ℎ 𝜄 𝑦 = 𝑕(𝜄 0 + 𝜄 1 𝑦 1 + 𝜄 2 𝑦 2 + 𝜄 3 𝑦 1 2 + 𝜄 6 𝑦 1 2 𝑦 2 + 𝜄 5 𝑦 1 2 𝑦 2 3 𝑦 2 + ⋯ ) 𝜄 4 𝑦 1 Slide credit: Andrew Ng

Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood Estimate (MLE) Choose

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

1 Problem with Brute force Nave Bayes ( ) ( ) ( ) It cannot generalize to unseen

Bayes Classifier (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1 Joint

Probabilistic classification CE-717: Machine Learning Sharif University of Technology M.

CS3505/5020 Software Practice II Updated topics schedule Transformations CS 3505 L05 - 1

Human-Oriented Robotics Supervised Learning Part 1/3 Kai Arras Social Robotics Lab, University

Phylogenetic trees Branch confidence Genome 559: Introduction to Statistical and Computational

Statistical Learning CS 786 University of Waterloo Lecture 6: May 17, 2012 Decision Tree

Modern Computational Statistics Lecture 20: Applications in Computational Biology Cheng Zhang

Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Logistic Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood Estimate (MLE) Choose

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

1 Problem with Brute force Nave Bayes ( ) ( ) ( ) It cannot generalize to unseen

Bayes Classifier (slides borrowed from Tom Mitchell, Barnabs Pczos &amp; Aarti Singh 1 Joint

Probabilistic classification CE-717: Machine Learning Sharif University of Technology M.

CS3505/5020 Software Practice II Updated topics schedule Transformations CS 3505 L05 - 1

Human-Oriented Robotics Supervised Learning Part 1/3 Kai Arras Social Robotics Lab, University

Phylogenetic trees Branch confidence Genome 559: Introduction to Statistical and Computational

Statistical Learning CS 786 University of Waterloo Lecture 6: May 17, 2012 Decision Tree

Modern Computational Statistics Lecture 20: Applications in Computational Biology Cheng Zhang

Bayes Classifier (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1 Joint