Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1
Classification Losses & Risks Discriminant Functions Association Rules Outline Classification 1 Losses & Risks 2 Discriminant Functions 3 Association Rules 4 2
Classification Losses & Risks Discriminant Functions Association Rules Bernoulli Distribution Random variable X ∈ 0 , 1 0 (1 − p 0 ) (1 − X ) Bernoulli: P { X = 1 } = p X Given a sample X = { x t } N t =1 t x t � we can estimate ˆ p 0 = N 3
Classification Losses & Risks Discriminant Functions Association Rules Classification Input � x = [ x 1 , x 2 ], Output C ∈ { 0 , 1 } Prediction: � C = 1 if P ( C = 1 | � x ) > 0 . 5 choose C = 0 otherwise Equivalently: � C = 1 if P ( C = 1 | � x ) > P ( C = 0 | � x ) choose C = 0 otherwise E.g., Credit scoring inputs are income and savings Output is low-risk versus high-risk 4
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence If we ignore the classes, how like are we to see a value � x ? 5
Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule - Multiple Classes P ( C i ) p ( � x | C i ) P ( C i | � x ) = p ( � x ) P ( C i ) p ( � x | C i ) = � K k =1 p ( � x | C k ) P ( C k ) P ( C i ) ≥ 0) and � K i =1 P ( C i ) = 1 choose C i if P ( C i | � x ) = max k P ( C k | � x ) 6
Classification Losses & Risks Discriminant Functions Association Rules Unequal Risks In many situations, different actions carry different potential gains and costs. Actions: α i Let λ ik denote the loss incurred by taking action α i when the current state is actually in C k Expected risk of taking action α i : K � R ( α i | � x ) = λ ik P ( C k | � x ) k =1 This is simply the expected value of the loss function given that we have chosen α i Choose α i if R ( α i | � x ) = min k R ( α k | � x ) 7
Classification Losses & Risks Discriminant Functions Association Rules Special Case: Equal Risks � 0 if i = k Suppose λ ik = 1 if i � = k Expected risk of taking action α i : � R ( α i | � x ) = K λ ikP ( C k | � x ) k =1 � = P ( C k | � x ) k � = i = (1 − P ( C i | � x )) Choose α i if R ( α i | � x ) = min k R ( α k | � x ) which happens when P ( C i | � x ) is largest So if all actions have equal cost, choose the action for the most probable class. 8
Classification Losses & Risks Discriminant Functions Association Rules Special Case: Indecision Suppose that making the wrong decision is more expensive than making no decision at all (i.e., falling back to some other procedure) Introduce a special reject action α K +1 that denotes the decision to not select a “real” action Cost of a reject is λ , 0 < λ < 1 0 if i = k λ ik = λ if i = K + 1 1 if i � = k 9
Classification Losses & Risks Discriminant Functions Association Rules The Risk of Indecision Risk: K � R ( α K +1 | � x ) = λ P ( C k | � x ) k =1 = λ K � R ( α i | � x ) = P ( C k | � x ) k � = i = 1 = P ( C i | � x ) Choose α i if P ( C i | � x ) > P ( C k | � x ) ∀ k � = i and P ( C i | � x ) > 1 − λ otherwise reject all actions 10
Classification Losses & Risks Discriminant Functions Association Rules Discriminant Functions An alternate vision. Instead of searching for the most probable class we seek a set of functions that divide the space into K decision regions R 1 , . . . R K � � R i = � x | g i ( � x ) = max g k ( � x ) k 11
Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. 12
Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution 12
Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution Later techniques will seek discriminant functions directly. 12
Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions We can form a discriminant function for the Bayes classifier very simply: g i ( � x ) = − R ( α i | � x ) If we have a constant loss function, we can use g i ( � x ) = P ( C i | � x ) P ( C i ) p ( � x | C i ) = p ( � x ) 13
Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions (cont.) P ( C i ) p ( � x | C i ) g i ( � x ) = p ( � x ) Because all the g i above would have the same denominator, we could alternatively do: g i ( � x ) = P ( C i ) p ( � x | C i ) 14
Classification Losses & Risks Discriminant Functions Association Rules Association Rules Suppose that we want to learn an association rule X → Y 15
Recommend
More recommend