CS440/ECE448: Intro to Artificial Intelligence � Bayes Nets � Lecture 20 A Bayes Net defines a joint distribution P(X 1 …X n ) over a set of random variables X 1 …X n � More on learning � Using the chain rule, we can factor P(X 1 …X n ) into graphical models � a product of n conditional distributions: � � P(X 1 …X n ) = ! j P(X i | X 1 …X i-1 ). Prof. Julia Hockenmaier � � juliahmr@illinois.edu � A Bayes Net makes a number of (conditional) � independence assumptions: � P(X 1 …X n ) = def ! j P(X i | Parents(X i ) ⊆ {X 1… X i-1 }) http://cs.illinois.edu/fa11/cs440 � � � � Learning Bayes Nets � Bayes Rule � � Parameter estimation: Given some data D over a P ( h | D ) = P ( D | h ) P ( h ) set of random variables X and a Bayes Net (with � P ( D ) empty CPTs) estimate the parameters (= fill in the � CPTs) of the Bayes Net . � P(h): prior probability of hypothesis � P(h | D) : posterior probability of hypothesis. � Structure learning: Given some data D over a set P(D | h) : likelihood of data, given hypothesis � of random variables X , find a Bayes Net (define its � CPTs) and estimate its parameters. � Prior ∝ posterior × likelihood � (This is much harder… we won ʼ t deal with it here) � P ( h | D ) ! P ( D | h ) P ( h ) � CS440/ECE448: Intro AI � 4 �
Three kinds of estimation Maximum likelihood learning � techniques � Bayes optimal: Marginalize out the hypotheses � Given data D , we want to find the P(X | D ) = ! i P(X | h i )P(h i | D) parameters that maximize P( D | θ ). � � MAP (maximum a posteriori): � We have a data set with N candies. � Pick the hypothesis with the highest posterior � c are cherry. l = (N-c) , are lime. � h MAP = argmax h P(h|D) Parameter θ = probability of cherry � � ML (maximum likelihood): � Maximum likelihood estimate: " = c/N Pick the hypothesis that assigns highest likelihood � h ML = argmax h P(D|h) � � CS440/ECE448: Intro AI � 5 � CS440/ECE448: Intro AI � 6 � Out of N candies, c are cherry. r c are cherry with a A more complex model � red wrapper, r l are lime with a red wrapper � � The likelihood of this data set: � Now the candy has two kinds of wrappers � P(d | " , " 1 , " 2 ) = " c (1- " ) N-c " 1 rc (1- " 1 ) c-rc " 2 rl (1- " 1 ) (N-c)-rl (red or green). � � The wrapper is chosen probabilistically, depending The log likelihood of this data set: � on the flavor of the candy. � L(d | " , " 1 , " 2 ) = [c log " + (N-c)log(1- " )] flavor � cherry: ! +[r c log " 1 + (c-r c )log(1- " 1 )] +[l c log " 2 + (N-c-l c )log(1- " 2 )] F P(red | F) The ML parameter estimates: cherry ! 1 wrapper � " = c/N " 1 = r c /c " 2 = r l /(N-c) � lime ! 2 CS440/ECE448: Intro AI � 7 � CS440/ECE448: Intro AI � 8 �
Medical diagnosis � The Naïve Bayes classifier � Assume the items in your data set have a Patients see a doctor and complain about a number of attribute A 1 …A n . � number of symptoms (headache, 100F � fever, …). � Each item also belongs to one of a number of given classes C 1 …C k . � � � What is the most likely disease d i , given the Which attributes an item has depends on its set of symptoms S the patient has? � class. � � ( ) If you only observe the attributes of an item, arg max P d | S can you predict the class? � i d i CS440/ECE448: Intro AI � 9 � CS440/ECE448: Intro AI � 10 � Naïve Bayes � The Naïve Bayes classifier � P(d 1 ) Disease P(d 2 ) P(s 1 |d 1 ) (1,2,3,…) P(d 3 ) P(s 1 |d 2 ) P(s 1 |d 3 ) • • • C � • • • Symptom1 Symptom2 Symptom3 • • • (T/F) (T/F) (T/F) P(s 3 |d 1 ) P(s 2 |d 1 ) P(s 3 |d 2 ) P(s 2 |d 2 ) … � A1 � A2 � An � P(s 3 |d 3 ) P(s 2 |d 3 ) • • • • • • CS440/ECE448: Intro AI � 11 �
Naïve Bayes � Maximum likelihood estimation � argmax C P(C| A 1 …A n ) = If we have a set of training data where the class of each item is given: � = argmax C P(A 1 …A n | C) P(C) – the multinomial P( C=c ) = freq(c)/N � = argmax C ! j P(A j | C) P(C) – for each attribute A j and class c: P(A j = a| c) = freq(a, c)/freq( c ) � We need to estimate: � where � – the multinomial P( C ) � freq(c ) = the number of items in the training – for each attribute A j and class c P(A j | c) � data that have class c � � freq(a, c ) = the number of items in the training data that have attribute a and class c. � � CS440/ECE448: Intro AI � 13 � CS440/ECE448: Intro AI � 14 � �
Recommend
More recommend