Logistic Regression Required reading: Mitchell draft chapter (see - PowerPoint PPT Presentation

Logistic Regression Required reading: • Mitchell draft chapter (see course website) Recommended reading: • Bishop, Chapter 3.1.3, 3.1.4 • Ng and Jordan paper (see course website) Machine Learning 10-701 Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University September 29, 2005

Naïve Bayes: What you should know • Designing classifiers based on Bayes rule • Conditional independence – What it is – Why it’s important • Naïve Bayes assumption and its consequences – Which (and how many) parameters must be estimated under different generative models (different forms for P(X|Y) ) • How to train Naïve Bayes classifiers – MLE and MAP estimates – with discrete and/or continuous inputs

Generative vs. Discriminative Classifiers Wish to learn f: X � Y, or P(Y|X) Generative classifiers (e.g., Naïve Bayes): • Assume some functional form for P(X|Y), P(Y) • This is the ‘ generative ’ model • Estimate parameters of P(X|Y), P(Y) directly from training data • Use Bayes rule to calculate P(Y|X= x i ) Discriminative classifiers: • Assume some functional form for P(Y|X) • This is the ‘ discriminative ’ model • Estimate parameters of P(Y|X) directly from training data

• Consider learning f: X � Y, where • X is a vector of real-valued features, < X 1 … X n > • Y is boolean • We could use a Gaussian Naïve Bayes classifier • assume all X i are conditionally independent given Y • model P(X i | Y = y k ) as Gaussian N( μ ik , σ ) • model P(Y) as Bernoulli ( π ) • What does that imply about the form of P(Y|X)?

• Consider learning f: X � Y, where • X is a vector of real-valued features, < X 1 … X n > • Y is boolean • assume all X i are conditionally independent given Y • model P(X i | Y = y k ) as Gaussian N( μ ik , σ i ) • model P(Y) as Bernoulli ( π ) • What does that imply about the form of P(Y|X)?

Very convenient! implies implies linear classification rule! implies

Derive form for P(Y|X) for continuous X i

Very convenient! implies implies linear classification rule! implies

Logistic function

Logistic regression more generally • Logistic regression in more general case, where Y ∈ {Y 1 ... Y R } : learn R-1 sets of weights for k<R for k=R

Training Logistic Regression: MCLE • Choose parameters W=<w 0 , ... w n > to maximize conditional likelihood of training data where • Training data D = • Data likelihood = • Data conditional likelihood =

Expressing Conditional Log Likelihood

Maximizing Conditional Log Likelihood Good news: l(W) is concave function of W Bad news: no closed-form solution to maximize l(W)

Maximize Conditional Log Likelihood: Gradient Ascent Gradient ascent algorithm: iterate until change < ε For all i , repeat

That’s all M(C)LE. How about MAP? • One common approach is to define priors on W – Normal distribution, zero mean, identity covariance • Helps avoid very large weights and overfitting • MAP estimate

MLE vs MAP • Maximum conditional likelihood estimate • Maximum a posteriori estimate

Naïve Bayes vs. Logistic Regression [Ng & Jordan, 2002] • Generative and Discriminative classifiers • Asymptotic comparison (# training examples � infinity) • when model correct • when model incorrect • Non-asymptotic analysis • convergence rate of parameter estimates • convergence rate of expected error • Experimental results

Naïve Bayes vs Logistic Regression Consider Y and X i boolean, X=<X 1 ... X n > Number of parameters: • NB: 2n +1 • LR: n+1 Estimation method: • NB parameter estimates are uncoupled • LR parameter estimates are coupled

What is the difference asymptotically? Notation: let denote error of hypothesis learned via algorithm A, from m examples • If assumed naïve Bayes model correct, then • If assumed model incorrect Note assumed discriminative model can be correct even when generative model incorrect, but not vice versa

Rate of covergence: logistic regression Let h Dis,m be logistic regression trained on m examples in n dimensions. Then with high probability: Implication: if we want for some constant , it suffices to pick � Convergences to its classifier, in order of n examples (result follows from Vapnik’s structural risk bound, plus fact that VCDim of n dimensional linear separators is n )

Rate of covergence: naïve Bayes Consider first how quickly parameter estimates converge toward their asymptotic values. Then we’ll ask how this influences rate of convergence toward asymptotic classification error.

Rate of covergence: naïve Bayes parameters

from UCI data experiments Some sets

What you should know: • Logistic regression – Functional form follows from Naïve Bayes assumptions – But training procedure picks parameters without the conditional independence assumption – MLE training: pick W to maximize P(Y | X, W) – MAP training: pick W to maximize P(W | X,Y) • ‘regularization’ • Gradient ascent/descent – General approach when closed-form solutions unavailable • Generative vs. Discriminative classifiers – Bias vs. variance tradeoff

Logistic Regression Required reading: Mitchell draft chapter (see - PowerPoint PPT Presentation

Logistic Regression Required reading: Mitchell draft chapter (see course website) Recommended reading: Bishop, Chapter 3.1.3, 3.1.4 Ng and Jordan paper (see course website) Machine Learning 10-701 Tom M. Mitchell Center for

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches

Bayesian logistic regression Already covered in lectures on classification Laplace and

Logistic Regression Required reading: Mitchell draft chapter (see - PowerPoint PPT Presentation

Logistic Regression Required reading: Mitchell draft chapter (see course website) Recommended reading: Bishop, Chapter 3.1.3, 3.1.4 Ng and Jordan paper (see course website) Machine Learning 10-701 Tom M. Mitchell Center for

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic Regression Two Worlds: Probabilistic &amp; Algorithmic We know two conceptual approaches

Bayesian logistic regression Already covered in lectures on classification Laplace and

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches