Logistic Regression Lecture 6 Logistic Regression Classification - PDF document

Logistic Regression Lecture 6 – Logistic Regression ◮ Classification ◮ Model CS 335 ◮ Cost function ◮ Gradient descent ◮ Linear classifiers and decision boundaries Dan Sheldon Classification Example: Hand-Written Digits Input: 20 × 20 grayscale image Unroll image into a feature vector x ∈ R 400 ◮ Input: x ∈ R n ◮ Output: y ∈ { 0 , 1 } x = ( x 1 , . . . , x 400 ) T Output:   x 1 x 21 . . . x 381 � 0 digit is "four" x 2 x 22 . . . x 382   y =   .   1 digit is "nine" . .     x 20 x 40 . . . x 400 Example: Document Classification The Learning Problem ◮ Input: x ∈ R n ◮ Output: y ∈ { 0 , 1 } Discuss on board. ◮ Model (hypothesis class): ? ◮ Cost function: ?

Classification as regression? The Model Exercise: fix the linear regression model Discuss on board h θ ( x ) = g ( θ T x ) , g : R → [0 , 1] . What should g look like? Logistic Function The Model 1 Put it together g ( z ) = 1 + e − z 1 h θ ( x ) = logistic( θ T x ) = 1 1 + e − θ T x g(z) 0.5 0 Nuance: −20 −15 −10 −5 0 5 10 15 20 z ◮ Output is in [0 , 1] , not { 0 , 1 } . ◮ This is called the logistic or sigmoid function ◮ Interpret as probability g ( z ) = logistic( z ) = sigmoid( z ) Hypothesis vs. Prediction Rule Prediction Rule 1 Hypothesis (for learning, or when probability is useful) y 0.5 1 0 h θ (x) 0.5 −20 −15 −10 −5 0 5 10 15 20 θ T x Rule 0 −20 −15 −10 −5 0 5 10 15 20 θ T x � 0 if h θ ( x ) < 1 / 2 Prediction rule (when you need to commit!) y = 1 if h θ ( x ) ≥ 1 / 2 1 Equivalent rule 0.5 y � if θ T x < 0 0 0 y = −20 −15 −10 −5 0 5 10 15 20 if θ T x ≥ 0 . 1 θ T x

The Model—Big Picture Cost Function Can we used squared error? Illustrate on board: x → z → p → y � ( h θ ( x ( i ) ) − y ( i ) ) 2 J ( θ ) = MATLAB visualization i This is sometimes done. But we want to do better. Cost Function Cost Function Suppose y = 1 . For squared error, cost( p, 1) looks like this 1 0.8 Let’s explore further. For squared error, we can write: squared error 0.6 m 0.4 � cost( h θ ( x ( i ) ) , y ( i ) ) J ( θ ) = 0.2 i =1 0 0 0.2 0.4 0.6 0.8 1 cost( p, y ) = ( p − y ) 2 h(x) If we undo the logistic transform, it looks like this cost( p, y ) is cost of predicting h θ ( x ) = p when the true value is y 1 squared error 0.5 0 −20 −10 0 10 20 θ T x Cost Function Log Loss ( y = 1) Exercise: fix these 1 cost ( p, 1) = − log p 0.8 squared error 0.6 5 0.4 4 0.2 log loss 3 0 0 0.2 0.4 0.6 0.8 1 h(x) 2 1 squared error 1 0.5 0 0 0 0.2 0.4 0.6 0.8 1 −20 −10 0 10 20 h(x) θ T x ◮ Recall that y = 1 is the correct answer 20 log loss ◮ As z = θ T x → ∞ , then p → 1 , so the prediction is better and better. 10 The cost approaches zero. ◮ As z = θ T x → −∞ , then p → 0 , so the prediction is worse and worse. 0 −20 −10 0 10 20 The cost. . . θ T x

Log Loss Equivalent Expression for Log-Loss � − log p y = 1 � − log p y = 1 cost ( p, y ) = cost ( p, y ) = − log(1 − p ) y = 0 − log(1 − p ) y = 0 5 y=1 y=0 4 cost ( p, y ) = − y log p − (1 − y ) log(1 − p ) log loss 3 y=1 20 log loss y=0 2 10 1 cost ( h θ ( x ) , y ) = − y log h θ ( x ) − (1 − y ) log(1 − h θ ( x )) 0 0 −20 −10 0 10 20 0 0.2 0.4 0.6 0.8 1 θ T x h(x) Review so far Gradient Descent for Logistic Regression ◮ Input: x ∈ R n 1. Initialize θ 0 , θ 1 , . . . , θ d arbitrarily ◮ Output: y ∈ { 0 , 1 } 2. Repeat until convergence ◮ Model (hypothesis class) 1 θ j ← θ j − α ∂ h θ ( x ) = logistic ( θ T x ) = J ( θ ) , j = 0 , . . . , d. 1 + e − θ T x ∂θ j ◮ Cost function: Partial derivatives for logistic regression (exercise): m ∂ ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) m � J ( θ ) = 2 � − y ( i ) log h θ ( x ( i ) ) − (1 − y ( i ) ) log(1 − h θ ( x ( i ) )) � � j J ( θ ) = ∂θ j i =1 i =1 (Same as linear regression! But h θ ( x ) is different ) TODO: optimize J ( θ ) Decision Boundaries Decision Boundaries 7.5 7 6.5 Example from R&N (Fig. 18.15). 6 5.5 5 7.5 x 2 7 4.5 4 6.5 6 3.5 3 5.5 5 2.5 x 2 4.5 4.5 5 5.5 6 6.5 7 4 x 1 3.5 E.g., suppose hypothesis is 3 2.5 4.5 5 5.5 6 6.5 7 h ( x 1 , x 2 ) = logistic (1 . 7 x 1 − x 2 − 4 . 9) x 1 Figure 1: Earthquakes (white circles) vs. nuclear explosions (black circles) Predict nuclear explosion if: by body wave magnitude ( x 1 ) and surface wave magnitude ( x 2 ) 1 . 7 x 1 − x 2 − 4 . 9 ≥ 0 x 2 ≤ 1 . 7 x 1 − 4 . 9

Linear Classifiers Nonlinear Decision Boundaries by Feature Expansion Example (Ng) Predict ( x 1 , x 2 ) �→ (1 , x 1 , x 2 , x 2 1 , x 2 � if θ T x < 0 , 2 , x 1 x 2 ) , 0 y = if θ T x ≥ 0 . � T 1 � θ = − 1 0 0 1 1 0 Exercise: what does decision boundary look like in ( x 1 , x 2 ) plane? Watch out! Hyperplane! Many other learning algorithms use linear classification rules ◮ Perceptron ◮ Support vector machines (SVMs) ◮ Linear discriminants Note: Where Does Log Loss Come From? � p y = 1 probability of y given p = 1 − p y = 0 � − log p y = 1 cost( p, y ) = − log probability = − log(1 − p ) y = 0 Find θ to minimize cost ← → Find θ to maximize probability

Logistic Regression Lecture 6 Logistic Regression Classification - PDF document

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335 Cost function Gradient descent Linear classifiers and decision boundaries Dan Sheldon Classification Example: Hand-Written Digits Input: 20

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Lecture 9 Logistic regression Lecture 9 Logistic regression 10 17 2008 Review Review

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Lecture 5: Representation Learning Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage:

Green relations Dominique Perrin 23 novembre 2015 Dominique Perrin Green relations Monoids A

Causal analysis within the framework of structural autoregressive models Alessio Moneta Scuola

ON THE FUTURE OF HIGH PERFORMANCE COMPUTING: HOW TO THINK FOR PETA AND EXASCALE COMPUTING JACK

JUMP START ABOUT ME My name is Ysmay . I ' m a digital strategist and web designer . I ' ve been

MEMORY SYSTEM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Performance plots for the Dual Phase FD TDR 27mar2018 - UPDATE Jose Soto & Bea Tapia &

Logistic Regression Lecture 6 Logistic Regression Classification - PDF document

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335 Cost function Gradient descent Linear classifiers and decision boundaries Dan Sheldon Classification Example: Hand-Written Digits Input: 20

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Lecture 9 Logistic regression Lecture 9 Logistic regression 10 17 2008 Review Review

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Lecture 5: Representation Learning Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage:

Green relations Dominique Perrin 23 novembre 2015 Dominique Perrin Green relations Monoids A

Causal analysis within the framework of structural autoregressive models Alessio Moneta Scuola

ON THE FUTURE OF HIGH PERFORMANCE COMPUTING: HOW TO THINK FOR PETA AND EXASCALE COMPUTING JACK

JUMP START ABOUT ME My name is Ysmay . I ' m a digital strategist and web designer . I ' ve been

MEMORY SYSTEM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Performance plots for the Dual Phase FD TDR 27mar2018 - UPDATE Jose Soto &amp; Bea Tapia &amp;

Performance plots for the Dual Phase FD TDR 27mar2018 - UPDATE Jose Soto & Bea Tapia &