Machine Learning (CSE 446): Probabilistic View of Logistic - PowerPoint PPT Presentation

Machine Learning (CSE 446): Probabilistic View of Logistic Regression and Linear Regression Sham M Kakade c � 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12

Announcements ◮ Midterm: Weds, Feb 7th. Policies: ◮ You may use a single side of a single sheet of handwritten notes that you prepared. ◮ You must turn your sheet of notes in, with your name on it, in at the conclusion of the exam, even if you never looked at it. ◮ You may not use electronics devices of any sort. ◮ Today: Review: Regularization and Optimization New: (wrap up GD) + probabilistic modeling! 1 / 12

Review 1 / 12

Regularization / Ridge Regression ◮ Regularize the optimization problem: N 1 1 � ( y n − w · x n ) 2 + λ � w � 2 = min N � Y − X ⊤ w � 2 + λ � w � 2 min N w w n =1 ◮ This particular case: “Ridge” Regression, Tikhonov regularization ◮ The solution is the least squares estimator : � 1 � − 1 � 1 � w least squares = N X ⊤ X + λ I N X ⊤ Y Regularization is often necessary for the “exact” solution method (regardless of if d bigger/less than N ) 2 / 12

Gradient Descent ◮ Want to solve: min F ( z ) z ◮ How should we update z? 3 / 12

Gradient Descent Data : function F : R d → R , number of iterations K , step sizes � η (1) , . . . , η ( K ) � Result : z ∈ R d initialize: z (0) = 0 ; for k ∈ { 1 , . . . , K } do z ( k ) = z ( k − 1) − η ( k ) · ∇ z F ( z ( k − 1) ) ; end return z ( K ) ; Algorithm 1: GradientDescent 3 / 12

Today 3 / 12

Gradient Descent: Convergence ◮ Denote: z ∗ = argmin z F ( z ) : the global minimum z ( k ) : our parameter after k updates. ◮ Thm: Suppose F is convex and “ L -smooth”. Using a fixed step size η ≤ 1 L , we have: F ( z ( k ) ) − F ( z ∗ ) ≤ � z (0) − z ∗ � 2 η · k That is the convergence rate is O ( 1 k ) . ◮ This Thm applies to both the square loss and logistic loss! 4 / 12

Proof intuition: smoothness and GD Convergence ◮ L -Smooth functions: “The gradients don’t change quickly.” Precisely, For all z, z ′ �∇ F ( z ) − ∇ F ( z ′ ) � ≤ L � z − z ′ � ◮ Proof idea: 1. If our gradient is large, we will make good progress decreasing our function value: 2. If our gradient is small, we must have value near the optimal value: 5 / 12

A better idea? ◮ Remember the Bayes optimal classifier. D ( x, y ) is the true probability of ( x, y ) . f ( BO ) ( x ) = argmax D ( x, y ) y = argmax D ( y | x ) y ◮ Of course, we don’t have D ( y | x ) . Probabilistic machine learning: define a probabilistic model relating random variables x to y and estimate its parameters . 6 / 12

A Probabilistic Model for Binary Classification: Logistic Regression ◮ For Y ∈ {− 1 , 1 } define p w ,b ( Y | X ) as: 1. Transform feature vector x via the “activation” function: a = w · x + b 2. Transform a into a binomial probability by passing it through the logistic function: 1 1 p w ,b ( Y = +1 | x ) = 1 + exp − a = 1 + exp − ( w · x + b ) 0.8 0.4 0.0 -10 -5 0 5 10 ◮ If we learn p w ,b ( Y | x ) , we can (almost) do whatever we like! 7 / 12

Maximum Likelihood Estimation The principle of maximum likelihood estimation is to choose our parameters to make our observed data as likely as possible (under our model). ◮ Mathematically: find ˆ w that maximizes the probability of the labels y 1 , . . . y n given the inputs x 1 , . . . x n . ◮ Note, by the i.i.d. assumption: D ( y 1 , . . . y n | x 1 , . . . x N ) = ◮ The Maximum Likelihood Estimator (the ’MLE’ ) is: N � w = argmax p w ( y n | x n ) ˆ w n =1 8 / 12

Maximum Likelihood Estimation and the Log loss ◮ The ’MLE’ is: N � w = argmax p w ( y n | x n ) ˆ w n =1 N � = argmax log p w ( y n | x n ) w n =1 N � = argmax log p w ( y n | x n ) w n =1 N � = argmin − log p w ( y n | x n ) w n =1 ◮ This is referred to as the log loss . 9 / 12

The MLE for Logistic Regression ◮ the MLE for the logistic regression model: N N � � argmin − log p w ( y n | x n ) = argmin log (1 + exp( − y n w · x n )) w w n =1 n =1 ◮ This is the logistic loss function that we saw earlier. ◮ How do we find the MLE? 10 / 12

Derivation for Log loss for Logistic Regression: scratch space 10 / 12

Linear Regression as a Probabilistic Model Linear regression defines p w ( Y | X ) as follows: 1. Observe the feature vector x ; transform it via the activation function: µ = w · x 2. Let µ be the mean of a normal distribution and define the density: 2 π exp − ( Y − µ ) 2 1 √ p w ( Y | x ) = 2 σ 2 σ 3. Sample Y from p w ( Y | x ) . 11 / 12

Linear Regression-MLE is (Unregularized) Squared Loss Minimization! N N 1 � � ( y n − w · x n ) 2 argmin − log p w ( y n | x n ) ≡ argmin N � �� w w n =1 n =1 SquaredLoss n ( w ,b ) Where did the variance go? 12 / 12

Machine Learning (CSE 446): Probabilistic View of Logistic - PowerPoint PPT Presentation

Machine Learning (CSE 446): Probabilistic View of Logistic Regression and Linear Regression Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements Midterm: Weds, Feb 7th. Policies: You

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

Machine Learning (CSE 446): Introduction Sham M Kakade 2018 c University of Washington

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Applied Machine Learning Applied Machine Learning Logistic Regression Siamak Ravanbakhsh Siamak

Midterm Exam Review + Binary Logistic Regression Matt Gormley Lecture 10 Sep. 25, 2019 1

Binary Logistic Regression + Multinomial Logistic Regression Matt Gormley Lecture 10 Feb. 17,

Windrose Planarity Embedding Graphs with Direction-Constrained Edges Philipp Kindermann LG

qst tr r srt

Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

CSE 158 Lecture 10 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday!

Machine Learning (CSE 446): Probabilistic View of Logistic - PowerPoint PPT Presentation

Machine Learning (CSE 446): Probabilistic View of Logistic Regression and Linear Regression Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements Midterm: Weds, Feb 7th. Policies: You

Machine Learning (CSE 446): Probabilistic Machine Learning MLE &amp; MAP Sham M Kakade 2018

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

Machine Learning (CSE 446): Introduction Sham M Kakade 2018 c University of Washington

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Applied Machine Learning Applied Machine Learning Logistic Regression Siamak Ravanbakhsh Siamak

Midterm Exam Review + Binary Logistic Regression Matt Gormley Lecture 10 Sep. 25, 2019 1

Binary Logistic Regression + Multinomial Logistic Regression Matt Gormley Lecture 10 Feb. 17,

Windrose Planarity Embedding Graphs with Direction-Constrained Edges Philipp Kindermann LG

qst tr r srt

Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

CSE 158 Lecture 10 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday!

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018