mle map na ve bayes
play

MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 - PowerPoint PPT Presentation

10 601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20,


  1. 10 � 601 � Introduction � to � Machine � Learning Machine � Learning � Department School � of � Computer � Science Carnegie � Mellon � University MLE/MAP + Naïve � Bayes Matt � Gormley Lecture � 17 Mar. � 20, � 2020 1

  2. Reminders � Homework 5: � Neural � Networks � Out: � Fri, � Feb � 28 � Due: � Sun, � Mar � 22 � at � 11:59pm � Homework 6: � Learning Theory / � Generative Models � Out: � Fri, � Mar � 20 � Due: � Fri, � Mar � 27 � at � 11:59pm TIP: � Do � the readings! � Today’s In � Class Poll � http://poll.mlcourse.org � Matt’s new � after � class office � hours (on Zoom) 2

  3. MLE � AND � MAP 14

  4. One � R.V. One � R.V. Likelihood � Function � Suppose � we � have � N � samples � D � = � {x (1) , � x (2) , � …, � x (N) } � from � a � random � variable � X In � both � cases � In � both � cases � � The � likelihood function: � (discrete � / � (discrete � / � continuous), � the � continuous), � the � � Case � 1: � X � is � discrete with � pmf p(x| � ) likelihood tells � us � likelihood tells � us � L( � ) � = � p(x (1) | � ) � p(x (2) | � ) � … � p(x (N) | � ) how � likely � one � how � likely � one � � Case � 2: � X � is � continuous with � pdf f(x| � ) � sample � is � relative � sample � is � relative � L( � ) � = � f(x (1) | � ) � f(x (2) | � ) � … � f(x (N) | � ) to � another to � another � The � log � likelihood function: � Case � 1: � X � is � discrete with � pmf p(x| � ) l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) � Case � 2: � X � is � continuous with � pdf f(x| � ) � l ( � ) � = � log f(x (1) | � ) � +… � + � log f(x (N) | � ) 17

  5. Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � conditional � likelihood � function: � Case � 1: � Y � is � discrete with � pmf p(y � | � x, �� ) L( � ) � = � p(y (1) � | � x (1) , �� ) � …p(y (N) � | � x (N) , �� ) � � Case � 2: � Y � is � continuous with � pdf f(y � | � x, �� ) L( � ) � = � f(y (1) � | � x (1) , �� ) � …f(y (N) � | � x (N) , �� ) � � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) 18

  6. Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) Mixed � � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � discrete/ � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) continuous! � Case � 3: � Y � is � discrete with � pmf p(y| � ) � and � X � is � continuous with � pdf f(x|y, � ) � L( � , �� ) � = � f(x (1) | � y (1) , �� ) � p(y (1) | � ) � … � f(x (N) | � y (N) , �� ) � p(y (N) | � ) � Case � 4: � Y � is � continuous with � pdf f(y| � ) � and � X � is � discrete with � pmf p(x|y, � ) � L( � , �� ) � = � p(x (1) | � y (1) , �� ) � f(y (1) | � ) � … � p(x (N) | � y (N) , �� ) � f(y (N) | � ) 19

  7. MLE Principle � of � Maximum � Likelihood � Estimation: Choose � the � parameters � that � maximize � the � likelihood � of � the � data. Maximum � Likelihood � Estimate � (MLE) � 2 � MLE L( � ) L( � 1 , �� 2 ) � MLE � 1 20

  8. MLE What � does � maximizing � likelihood � accomplish? � There � is � only � a � finite � amount � of � probability � mass � (i.e. � sum � to � one � constraint) � MLE � tries � to � allocate � as � much � probability � mass � as � possible � to � the � things � we � have � observed… … at � the � expense of � the � things � we � have � not observed 21

  9. Recipe � for � Closed � form � MLE 1. Assume � data � was � generated � i.i.d. � from � some � model (i.e. � write � the � generative � story) x (i) ~ � p(x| � ) 2. Write � log � likelihood l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) 3. Compute � partial � derivatives � (i.e. � gradient) � l ( � )/ � � 1 = � … � l ( � )/ � � 2 = � … … � l ( � )/ � � M = � … 4. Set � derivatives � to � zero � and � solve � for � � � l ( � )/ � � m = � 0 � for � all � m � � {1, � …, � M} � MLE = � solution � to � system � of � M � equations � and � M � variables Compute � the � second � derivative � and � check � that � l ( � ) � is � concave � down � 5. at � � MLE 22

  10. MLE Example: � MLE � of � Exponential � Distribution Goal: Steps: 23

  11. MLE Example: � MLE � of � Exponential � Distribution 24

  12. MLE Example: � MLE � of � Exponential � Distribution 25

  13. MLE In � Class � Exercise In � Class � Exercise Steps � to � answer: Steps � to � answer: Show � that � the � MLE � of � Show � that � the � MLE � of � 1. Write � log � likelihood � 1. Write � log � likelihood � parameter � � for � N � parameter � � for � N � of � sample of � sample samples � drawn � from � samples � drawn � from � 2. Compute � derivative � 2. Compute � derivative � Bernoulli( � ) � is: Bernoulli( � ) � is: w.r.t. � � w.r.t. � � 3. Set � derivative � to � 3. Set � derivative � to � zero � and � solve � for � � zero � and � solve � for � � 26

  14. MLE Question: Question: Answer: Answer: l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) Assume � we � have � N � samples � x (1) , � Assume � we � have � N � samples � x (1) , � A. A. x (2) , � …, � x (N) drawn � from � a � x (2) , � …, � x (N) drawn � from � a � l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) B. B. Bernoulli( � ). Bernoulli( � ). l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 C. C. l( � ) � = � log( � ) N1 + � log(1 � � ) N0 l( � ) � = � log( � ) N1 + � log(1 � � ) N0 D. D. What � is � the � log � likelihood of � What � is � the � log � likelihood of � l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) E. E. the � data � l ( � ) ? the � data � l ( � ) ? l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) F. F. l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 G. G. Assume � N 1 = � # � of � (x (i) = � 1) Assume � N 1 = � # � of � (x (i) = � 1) l( � ) � = � log( � ) N0 + � log(1 � � ) N1 l( � ) � = � log( � ) N0 + � log(1 � � ) N1 H. H. N 0 = � # � of � (x (i) = � 0) N 0 = � # � of � (x (i) = � 0) l( � ) � = � the � most � likely � answer l( � ) � = � the � most � likely � answer I. I. 27

Recommend


More recommend