Ba y esian Learning �Read Ch� �� �Suggested exercises� ���� ���� ���� � Ba y es Theorem � MAP � ML h yp otheses � MAP learners � Minim um description length principle � Ba y es optimal classi�er � Naiv e Ba y es learner � Example� Learning o v er text data � Ba y esian b elief net w orks � Exp ectation Maximization algorithm ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Tw o Roles for Ba y esian Metho ds Pro vides practical learning algorithms� � Naiv e Ba y es learning � Ba y esian b elief net w ork learning � Com bine prior kno wledge �prior probabiliti es� with observ ed data � Requires prior probabiliti es Pro vides useful conceptual framew ork � Pro vides �gold standard� for ev aluating other learning algorithms � Additional insigh t in to Occam�s razor ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Ba y es Theorem � D j h � P � h � P � h j D � � P � D � P � � h � � prior probabilit y of h yp othesis P h � � D � � prior probabilit y of training data P D � � h j D � � probabilit y of giv en P h D � � D j h � � probabilit y of giv en P D h ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Cho osing Hyp otheses � D j h � P � h � P � h j D � � P � D � P Generally w an t the most probable h yp othesis giv en the training data Maximum a p osteriori h yp othesis � h M AP � arg max � h j D � h P M AP h � H � D j h � P � h � P � arg max � D � P h � H � arg max � D j h � P � h � P h � H If assume � h � � � h � then can further simplify � P P i j and c ho ose the Maximum likeliho o d �ML� h yp othesis � arg max � D j h � h P M L i h � H i ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Ba y es Theorem Do es patien t ha v e cancer or not� A patien t tak es a lab test and the result comes bac k p ositiv e� The test returns a correct p ositiv e result in only ��� of the cases in whic h the disease is actually presen t� and a correct negativ e result in only ��� of the cases in whic h the disease is not presen t� F urthermore� � ��� of the en tire p opulation ha v e this cancer� � cancer � � � � cancer � � P P �� j cancer � � � �j cancer � � P P �� j� cancer � � � �j� cancer � � P P ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Basic F orm ulas for Probabilities � Pr o duct R ule � probabilit y � A � � of a P B conjunction of t w o ev en ts A and B� � A � � � � A j B � P � B � � � B j A � P � A � P B P P � Sum R ule � probabilit y of a disjunction of t w o ev en ts A and B� � A � � � � A � � � B � � � A � � P B P P P B � The or em of total pr ob ability � if ev en ts A � � � � � A � n n are m utually exclusiv e with � A � � �� then P P i i �� n � B � � � B j A � P � A � P P X i i i �� ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Brute F orce MAP Hyp othesis Learner �� F or eac h h yp othesis in � calculate the h H p osterior probabilit y � D j h � P � h � P � h j D � � P � D � P �� Output the h yp othesis with the highest h M AP p osterior probabilit y � argmax � h j D � h P M AP h � H ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Relation to Concept Learning Consider our usual concept learning task � instance space � h yp othesis space � training X H examples D � consider the learning algorithm �outputs FindS most sp eci�c h yp othesis from the v ersion space � V S H �D What w ould Ba y es rule pro duce as the MAP h yp othesis� Do es output a MAP h yp othesis�� F indS ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Relation to Concept Learning Assume �xed set of instances h x i � � � � � x � m Assume is the set of classi�cations D � h c � x � � c � x � i D � � � � � m Cho ose � D j h �� P ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Relation to Concept Learning Assume �xed set of instances h x i � � � � � x � m Assume is the set of classi�cations D � h c � x � � c � x � i D � � � � � m Cho ose � D j h � P � � D j h � � � if consisten t with P h D � � D j h � � � otherwise P Cho ose � h � to b e uniform distribution P � � � h � � for all in P h H j H j Then� � � if is consisten t with h D � � j V j � S � H �D � � � P � h j D � � � � � � � � otherwise � � � � ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Ev olution of P osterior Probabiliti es P h ) ( P(h|D 1) P(h|D 1, D 2) hypotheses hypotheses hypotheses ( ) a ( ) b ( ) c ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Characterizing Learning Algorithms b y Equiv alen t MAP Learners Inductive system Training examples D Output hypotheses Candidate Elimination Hypothesis space H Algorithm Equivalent Bayesian inference system Training examples D Output hypotheses Hypothesis space H Brute force MAP learner P(h) uniform P(D|h) = 0 if inconsistent, = 1 if consistent Prior assumptions made explicit ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Learning A Real V alued F unction y f h ML e x Consider an y real�v alued target function f T raining examples h x i � where is noisy � d d i i i training v alue � � � x � � d f e i i i � is random v ariable �noise� dra wn e i indep enden tly for eac h according to some x i Gaussian distribution with mean�� Then the maxim um lik eli ho o d h yp othesis is h M L the one that minimizes the sum of squared errors� m � � arg min � � h � x �� h d X M L i i h � H i �� ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Learning A Real V alued F unction � argmax p � D j h � h M L h � H m � argmax p � d j h � Y i i �� h � H � m d � h � x � � � i i � � � p � argmax e Y � � � � � � i �� h � H Maximize natural log of this instead��� � � � � � h � x � � d m i i p � argmax ln � h X B C M L B C � � � � � � � A i �� h � H � � � h � x � � d � m i i � argmax � X B C B C � � � A i �� h � H m � � argmax � � d � h � x �� X i i i �� h � H m � � argmin � � h � x �� d X i i i �� h � H ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Learning to Predict Probabiliti es Consider predicting surviv al probabilit y from patien t data T raining examples h x i � where is � or � � d d i i i W an t to train neural net w ork to output a pr ob ability giv en �not a � or �� x i In this case can sho w m � argmax ln h � x � � �� � � ln�� � h � x �� h d d X M L i i i i i �� h � H W eigh t up date rule for a sigmoid unit� � � � w w w j k j k j k where m � w � � d � h � x �� � x X j k i i ij k i �� ��� lecture slides for textb o ok Machine L e arning � T� Mitc hell� McGra w Hill� ����
Recommend
More recommend