Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - - PowerPoint PPT Presentation

machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - - PowerPoint PPT Presentation

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic


  • Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /

  • Agenda Agenda  Probabilistic Classification  Introduction to Logistic regression  Binary logistic regression  Logistic regression: Decision surface  Logistic regression: ML estimation  Logistic regression: Gradient descent  Logistic regression: multi-class  Logistic Regression: Regularization  Logistic Regression VS. Naïve Bayes Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

  • Probab Probabil ilis isti tic C c Classi lassifi ficati cation on  Generative probabilistic classification (Previous lecture)  motivation: assume a distribution for each class and try to find the parameters for the distributions  cons: need to assume distributions; need to fit many parameters  Discriminative approach: Logistic regression (Focus of today)  motivation: like least square, but assume logistic distribution y(x) = (wTx); classify based on y(x) > 0:5 or not.  technique: gradient descent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

  • Int Introducti roduction to on to Logisti Logistic r c regression egression  Logistic regression represents the probability of category i using a linear function of the input variables:  The name comes from the logit transformation: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

  • Bi Binary logist nary logistic regressi ic regression on  Logistic Regression assumes a parametric form for the distribution ( | ) P Y X then directly estimates its parameters from the training data. The Y parametric model assumed by Logistic Regression in the case where is boolean is:  Notice that equation (2) follows directly from equation (1), because the sum of these two probabilities must equal 1. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

  • Bi Binary logist nary logistic regressi ic regression on  We only need one set of parameters:  Sigmoid (logistic) function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

  • Logisti Logistic r c regression egression vs. Linear vs. Linear r regression egression Adapted from slides of John Whitehead Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

  • Logisti Logistic r c regression: egression: Decisi Decision surf on surface ace  Given a logistic regression W and an X:  Decision surface 𝑔 ( 𝒚 ; 𝒙 )=constant  Decision surfaces are linear functions of 𝒚  Decision making on Y: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

  • Computing the likelihood in details  We can re-express the log of the conditional likelihood as:       l l l l l l ( ) w ln ( 1| x w , ) (1 )ln ( 0| x w , ) l y P y y P y l  l l ( 1| x w , ) P y     l l l ln ln ( 0| x w , ) y P y  l l ( 0| x w , ) P y l n n         l l l ( ) ln(1 exp( )) y w w x w w x 0 0 i i i i   1 1 l i i Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

  • Logistic regression: ML estimation is a concave in w What is a concave and a convex function? No closed form solution Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

  • Opti ptimi mizing co zing concav ncave/convex e/convex functi function on  Maximum of a concave function = minimum of a convex function  Gradient ascent (concave) / Gradient descent (convex) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

  • Gradi radient a ent ascen scent t / G / Gradi radient d ent desce escent nt  For function f(w)  If f is concave : Gradient ascent rule  If f is convex: Gradient descent rule Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

  • Logistic regression: Gradient descent  Iteratively updating the weights in this fashion increases likelihood each round.  We eventually reach the maximum  We are near the maximum when changes in the weights are small.  Thus, we can stop when the sum of the absolute values of the weight differences is less than some small number. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

  • Logistic regression: multi-class  In the two-class case  For multiclass, we work with soft-max function instead of logistic sigmoid  Aka Softmax Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

  • Logisti Logistic R c Regression: egression: Regulari Regularizati zation on  Overfitting the training data is a problem that can arise in Logistic Regression, especially when data has very high dimensions and is sparse.  One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of w.    l l 2 w = argmax ln ( | x w , ) || w || P y 2 w l  The derivative of this penalized log likelihood function is similar to our earlier derivative, with one additional penalty term   ˆ      l l l l ( ) w ( ( 1| x w , )) l x y P y w  i i w l i  which gives us the modified gradient descent rule  ˆ        l l l l ( ( 1| x w , )) w w x y P y w i i i i l Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

  • Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  In general, NB and LR make different assumptions  NB: Features independent given class -> assumption on P(X|Y)  LR: Functional form of P(Y|X), no assumption on P(X|Y)  LR is a linear classifier  decision rule is a hyperplane  LR optimized by conditional likelihood  no closed-form solution  concave -> global optimum with gradient ascent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

  • Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  Consider Y and Xi boolean, X=<X1... Xn>  Number of parameters:  NB: 2n +1  LR: n+1  Estimation method:  NB parameter estimates are uncoupled  LR parameter estimates are coupled Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

  • Logistic Regression VS. Gaussian Naive Bayes  When the GNB modeling assumptions do not hold, Logistic Regression and GNB typically learn different classifier functions  Logistic Regression is consistent with the Naïve Bayes assumption that the input features Xi are conditionally independent given Y ,it is not rigidly tied to this assumption as is Naive Bayes.  GNB parameter estimates converge toward their asymptotic values in order log(n) examples, where n is the dimension of X . Logistic Regression parameter estimates converge more slowly, requiring order (n ) examples. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

  • Summary  Logistic Regression learns the Conditional Probability Distribution P(y|x)  Local Search.  Begins with initial weight vector.  Modifies it iteratively to maximize an objective function.  The objective function is the conditional log likelihood of the data: so the algorithm seeks the probability distribution P(y|x) that is most likely given the data. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

  • Any Q Any Questi uestion on End of Lecture 9 Thank you! Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20