Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, - PDF document

Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] � Ba y es Theorem � MAP , ML h yp otheses � MAP learners � Minim um description length principle � Ba y es optimal classi�er � Naiv e Ba y es learner � Example: Learning o v er text data � Ba y esian b elief net w orks � Exp ectation Maximization algorithm 125 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Tw o Roles for Ba y esian Metho ds Pro vides practical learning algorithms: � Naiv e Ba y es learning � Ba y esian b elief net w ork learning � Com bine prior kno wledge (prior probabiliti es) with observ ed data � Requires prior probabiliti es Pro vides useful conceptual framew ork � Pro vides \gold standard" for ev aluating other learning algorithms � Additional insigh t in to Occam's razor 126 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Ba y es Theorem P ( D j h ) P ( h ) P ( h j D ) = P ( D ) � P ( h ) = prior probabilit y of h yp othesis h � P ( D ) = prior probabilit y of training data D � P ( h j D ) = probabilit y of h giv en D � P ( D j h ) = probabilit y of D giv en h 127 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Cho osing Hyp otheses P ( D j h ) P ( h ) P ( h j D ) = P ( D ) Generally w an t the most probable h yp othesis giv en the training data Maximum a p osteriori h yp othesis h : M AP h = arg max P ( h j D ) M AP h 2 H P ( D j h ) P ( h ) = arg max h 2 H P ( D ) = arg max P ( D j h ) P ( h ) h 2 H If assume P ( h ) = P ( h ) then can further simplify , i j and c ho ose the Maximum likeliho o d (ML) h yp othesis h = arg max P ( D j h ) M L i h 2 H i 128 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Ba y es Theorem Do es patien t ha v e cancer or not? A patien t tak es a lab test and the result comes bac k p ositiv e. The test returns a correct p ositiv e result in only 98% of the cases in whic h the disease is actually presen t, and a correct negativ e result in only 97% of the cases in whic h the disease is not presen t. F urthermore, : 008 of the en tire p opulation ha v e this cancer. P ( cancer ) = P ( : cancer ) = P (+ j cancer ) = P ( �j cancer ) = P (+ j: cancer ) = P ( �j: cancer ) = 129 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Basic F orm ulas for Probabilities � Pr o duct R ule : probabilit y P ( A ^ B ) of a conjunction of t w o ev en ts A and B: P ( A ^ B ) = P ( A j B ) P ( B ) = P ( B j A ) P ( A ) � Sum R ule : probabilit y of a disjunction of t w o ev en ts A and B: P ( A _ B ) = P ( A ) + P ( B ) � P ( A ^ B ) � The or em of total pr ob ability : if ev en ts A ; : : : ; A 1 n P n are m utually exclusiv e with P ( A ) = 1, then i i =1 n X P ( B ) = P ( B j A ) P ( A ) i i i =1 130 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Brute F orce MAP Hyp othesis Learner 1. F or eac h h yp othesis h in H , calculate the p osterior probabilit y P ( D j h ) P ( h ) P ( h j D ) = P ( D ) 2. Output the h yp othesis h with the highest M AP p osterior probabilit y h = argmax P ( h j D ) M AP h 2 H 131 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Relation to Concept Learning Consider our usual concept learning task � instance space X , h yp othesis space H , training examples D � consider the FindS learning algorithm (outputs most sp eci�c h yp othesis from the v ersion space V S ) H ;D What w ould Ba y es rule pro duce as the MAP h yp othesis? Do es F indS output a MAP h yp othesis?? 132 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Relation to Concept Learning Assume �xed set of instances h x ; : : : ; x i 1 m Assume D is the set of classi�cations D = h c ( x ) ; : : : ; c ( x ) i 1 m Cho ose P ( D j h ): 133 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Relation to Concept Learning Assume �xed set of instances h x ; : : : ; x i 1 m Assume D is the set of classi�cations D = h c ( x ) ; : : : ; c ( x ) i 1 m Cho ose P ( D j h ) � P ( D j h ) = 1 if h consisten t with D � P ( D j h ) = 0 otherwise Cho ose P ( h ) to b e uniform distribution 1 � P ( h ) = for all h in H j H j Then, 8 1 > > > if h is consisten t with D > > > j V S j > H ;D < P ( h j D ) = > > > > > > > : 0 otherwise 134 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Ev olution of P osterior Probabiliti es P h ) ( P(h|D 1) P(h|D 1, D 2) hypotheses hypotheses hypotheses ( ) ( ) ( ) a b c 135 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Characterizing Learning Algorithms b y Equiv alen t MAP Learners Inductive system Training examples D Output hypotheses Candidate Elimination Hypothesis space H Algorithm Equivalent Bayesian inference system Training examples D Output hypotheses Hypothesis space H Brute force MAP learner P(h) uniform 136 P(D|h) = 0 if inconsistent, lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997 = 1 if consistent Prior assumptions made explicit

Learning A Real V alued F unction Consider an y real-v alued target function f T raining examples h x ; d i , where d is noisy i i i training v alue y � d = f ( x ) + e i i i f � e is random v ariable (noise) dra wn h ML i e indep enden tly for eac h x according to some i Gaussian distribution with mean=0 Then the maxim um lik eli ho o d h yp othesis h is M L x the one that minimizes the sum of squared errors: m X 2 h = arg min ( d � h ( x )) M L i i h 2 H i =1 137 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Learning A Real V alued F unction h = argmax p ( D j h ) M L h 2 H m Y = argmax p ( d j h ) i i =1 h 2 H d � h ( x ) m 1 1 2 i i Y � ( ) 2 � p = argmax e 2 i =1 h 2 H 2 � � Maximize natural log of this instead... 0 1 2 m 1 1 d � h ( x ) X B i i C B C p h = argmax ln � @ A M L 2 i =1 2 � h 2 H 2 � � 0 1 2 m 1 d � h ( x ) X B i i C B C = argmax � @ A i =1 2 � h 2 H m X 2 = argmax � ( d � h ( x )) i i i =1 h 2 H m X 2 = argmin ( d � h ( x )) i i i =1 h 2 H 138 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Learning to Predict Probabiliti es Consider predicting surviv al probabilit y from patien t data T raining examples h x ; d i , where d is 1 or 0 i i i W an t to train neural net w ork to output a pr ob ability giv en x (not a 0 or 1) i In this case can sho w m X h = argmax d ln h ( x ) + (1 � d ) ln(1 � h ( x )) M L i i i i i =1 h 2 H W eigh t up date rule for a sigmoid unit: w w + � w j k j k j k where m X � w = � ( d � h ( x )) x j k i i ij k i =1 139 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Minim um Description Length Principl e Occam's razor: prefer the shortest h yp othesis MDL: prefer the h yp othesis h that minimizes h = argmin L ( h ) + L ( D j h ) M D L C C 1 2 h 2 H where L ( x ) is the description length of x under C enco ding C Example: H = decision trees, D = training data lab els ( h ) is # bits to describ e tree h � L C 1 � L ( D j h ) is # bits to describ e D giv en h C 2 { Note L ( D j h ) = 0 if examples classi�ed C 2 p erfectly b y h . Need only describ e exceptions � Hence h trades o� tree size for training M D L errors 140 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, - PDF document

Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Ba y es Theorem MAP , ML h yp otheses MAP learners Minim um description length principle Ba y es optimal classier

Ba y esian Learning Read Ch Suggested exercises

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8]

Com bining Inductiv e and Analytical Learning [Read Ch. 12] [Suggested exercises: 12.1,

Outline [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from

Read Chapter 7 of Machine Learning [Suggested exercises: 7.1, 7.2, 7.5, 7.7] Function

Outline read Chapter suggested exercises

EXERCISES EXERCISES Important Perfectly safe for the vast majority of people Those with

Neck Exercises for Prevention, Neck Exercises for Prevention, Rehabilitation and Strength

Course setup 9 ec course examination based on computer exercises weekly exercises

Exercises, II part Forward Chaining: 12 Jul 2012 Exercises, II part Consider the following set

What ' s in a Ba y esian Model ? BAYE SIAN R E G R E SSION MOD E L IN G W ITH R STAN AR M

Welcome ! BAYE SIAN R E G R E SSION MOD E L IN G W ITH R STAN AR M Jake Thompson Ps y

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Wh y u se Ba y esian data anal y sis ? FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u

rr ss t str

Unication of static analyses and runtime measurements for improving vectorization Ashay Rane,

rs rt tr s

sttt t r

A F amily of Data P a rallel Derivations Maurice Clint Stephen Fitzpatrick T erence

RoboChart & RoboSim Modelling Robots and Collections Alvaro Miyazawa Department of Computer

P t r t t

UMBC A B M A L T F O U M B C I M Y O R T 1 (April 13, 2000 6:35 pm) I E S R

Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, - PDF document

Ba y esian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Ba y es Theorem MAP , ML h yp otheses MAP learners Minim um description length principle Ba y es optimal classier

Ba y esian Learning Read Ch Suggested exercises

Ba y esian Decon v olution of Seismic Arra y Data for RippleFired Explosions Eric

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8]

Com bining Inductiv e and Analytical Learning [Read Ch. 12] [Suggested exercises: 12.1,

Outline [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from

Read Chapter 7 of Machine Learning [Suggested exercises: 7.1, 7.2, 7.5, 7.7] Function

Outline read Chapter suggested exercises

EXERCISES EXERCISES Important Perfectly safe for the vast majority of people Those with

Neck Exercises for Prevention, Neck Exercises for Prevention, Rehabilitation and Strength

Course setup 9 ec course examination based on computer exercises weekly exercises

Exercises, II part Forward Chaining: 12 Jul 2012 Exercises, II part Consider the following set

What ' s in a Ba y esian Model ? BAYE SIAN R E G R E SSION MOD E L IN G W ITH R STAN AR M

Welcome ! BAYE SIAN R E G R E SSION MOD E L IN G W ITH R STAN AR M Jake Thompson Ps y

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Wh y u se Ba y esian data anal y sis ? FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u

rr ss t str

Unication of static analyses and runtime measurements for improving vectorization Ashay Rane,

rs rt tr s

sttt t r

A F amily of Data P a rallel Derivations Maurice Clint Stephen Fitzpatrick T erence

RoboChart &amp; RoboSim Modelling Robots and Collections Alvaro Miyazawa Department of Computer

P t r t t

UMBC A B M A L T F O U M B C I M Y O R T 1 (April 13, 2000 6:35 pm) I E S R

RoboChart & RoboSim Modelling Robots and Collections Alvaro Miyazawa Department of Computer