machine learning
play

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic


  1. Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /

  2. Agenda Agenda  Probabilistic Classification  Introduction to Logistic regression  Binary logistic regression  Logistic regression: Decision surface  Logistic regression: ML estimation  Logistic regression: Gradient descent  Logistic regression: multi-class  Logistic Regression: Regularization  Logistic Regression VS. Naïve Bayes Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

  3. Probab Probabil ilis isti tic C c Classi lassifi ficati cation on  Generative probabilistic classification (Previous lecture)  motivation: assume a distribution for each class and try to find the parameters for the distributions  cons: need to assume distributions; need to fit many parameters  Discriminative approach: Logistic regression (Focus of today)  motivation: like least square, but assume logistic distribution y(x) = (wTx); classify based on y(x) > 0:5 or not.  technique: gradient descent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

  4. Int Introducti roduction to on to Logisti Logistic r c regression egression  Logistic regression represents the probability of category i using a linear function of the input variables:  The name comes from the logit transformation: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

  5. Bi Binary logist nary logistic regressi ic regression on  Logistic Regression assumes a parametric form for the distribution ( | ) P Y X then directly estimates its parameters from the training data. The Y parametric model assumed by Logistic Regression in the case where is boolean is:  Notice that equation (2) follows directly from equation (1), because the sum of these two probabilities must equal 1. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

  6. Bi Binary logist nary logistic regressi ic regression on  We only need one set of parameters:  Sigmoid (logistic) function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

  7. Logisti Logistic r c regression egression vs. Linear vs. Linear r regression egression Adapted from slides of John Whitehead Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

  8. Logisti Logistic r c regression: egression: Decisi Decision surf on surface ace  Given a logistic regression W and an X:  Decision surface 𝑔 ( 𝒚 ; 𝒙 )=constant  Decision surfaces are linear functions of 𝒚  Decision making on Y: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

  9. Computing the likelihood in details  We can re-express the log of the conditional likelihood as:       l l l l l l ( ) w ln ( 1| x w , ) (1 )ln ( 0| x w , ) l y P y y P y l  l l ( 1| x w , ) P y     l l l ln ln ( 0| x w , ) y P y  l l ( 0| x w , ) P y l n n         l l l ( ) ln(1 exp( )) y w w x w w x 0 0 i i i i   1 1 l i i Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

  10. Logistic regression: ML estimation is a concave in w What is a concave and a convex function? No closed form solution Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

  11. Opti ptimi mizing co zing concav ncave/convex e/convex functi function on  Maximum of a concave function = minimum of a convex function  Gradient ascent (concave) / Gradient descent (convex) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

  12. Gradi radient a ent ascen scent t / G / Gradi radient d ent desce escent nt  For function f(w)  If f is concave : Gradient ascent rule  If f is convex: Gradient descent rule Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

  13. Logistic regression: Gradient descent  Iteratively updating the weights in this fashion increases likelihood each round.  We eventually reach the maximum  We are near the maximum when changes in the weights are small.  Thus, we can stop when the sum of the absolute values of the weight differences is less than some small number. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

  14. Logistic regression: multi-class  In the two-class case  For multiclass, we work with soft-max function instead of logistic sigmoid  Aka Softmax Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

  15. Logisti Logistic R c Regression: egression: Regulari Regularizati zation on  Overfitting the training data is a problem that can arise in Logistic Regression, especially when data has very high dimensions and is sparse.  One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of w.    l l 2 w = argmax ln ( | x w , ) || w || P y 2 w l  The derivative of this penalized log likelihood function is similar to our earlier derivative, with one additional penalty term   ˆ      l l l l ( ) w ( ( 1| x w , )) l x y P y w  i i w l i  which gives us the modified gradient descent rule  ˆ        l l l l ( ( 1| x w , )) w w x y P y w i i i i l Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

  16. Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  In general, NB and LR make different assumptions  NB: Features independent given class -> assumption on P(X|Y)  LR: Functional form of P(Y|X), no assumption on P(X|Y)  LR is a linear classifier  decision rule is a hyperplane  LR optimized by conditional likelihood  no closed-form solution  concave -> global optimum with gradient ascent Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

  17. Logisti Logistic R c Regression VS. egression VS. N Naïve Bayes aïve Bayes  Consider Y and Xi boolean, X=<X1... Xn>  Number of parameters:  NB: 2n +1  LR: n+1  Estimation method:  NB parameter estimates are uncoupled  LR parameter estimates are coupled Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

  18. Logistic Regression VS. Gaussian Naive Bayes  When the GNB modeling assumptions do not hold, Logistic Regression and GNB typically learn different classifier functions  Logistic Regression is consistent with the Naïve Bayes assumption that the input features Xi are conditionally independent given Y ,it is not rigidly tied to this assumption as is Naive Bayes.  GNB parameter estimates converge toward their asymptotic values in order log(n) examples, where n is the dimension of X . Logistic Regression parameter estimates converge more slowly, requiring order (n ) examples. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

  19. Summary  Logistic Regression learns the Conditional Probability Distribution P(y|x)  Local Search.  Begins with initial weight vector.  Modifies it iteratively to maximize an objective function.  The objective function is the conditional log likelihood of the data: so the algorithm seeks the probability distribution P(y|x) that is most likely given the data. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

  20. Any Q Any Questi uestion on End of Lecture 9 Thank you! Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20

Recommend


More recommend