Lecture 9: − Naïve Bayes Classifier (cont’d.) − Logistic Regression − Discriminative vs. Generative Classification − Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University
Last time… Naïve Bayes Classifier – Given : – ,… – Class prior P(Y) – d conditionally independent features X 1 ,… X d given the – class label Y – For each X i feature, we have the conditional likelihood P(X i |Y) Naïve Bayes Decision rule: slide by Barnabás Póczos & Aarti Singh 2
Last time… Naïve Bayes Algorithm for discrete features discrete features We need to estimate these probabilities! Estimators For Class Prior For Likelihood slide by Barnabás Póczos & Aarti Singh NB Prediction for test data: 19 3
Last time… Text Classification MEDLINE Article MeSH Subject Category Hierarchy • Antogonists and Inhibitors • Blood Supply ? • Chemistry • Drug Therapy • Embryology • Epidemiology • … slide by Dan Jurafsky 4
Last time… Bag of words model Typical additional assumption: Position in ¡document ¡doesn’t ¡matter : P(X i =x i |Y=y) = P(X k =x i |Y=y) – “Bag ¡of ¡words” ¡model ¡– order of words on the page ignored The document is just a bag of words: i.i.d. words – Sounds really silly, but often works very well! ) K( 50000-1) parameters to estimate The probability of a document with words x 1 ,x 2 ,… ¡ slide by Barnabás Póczos & Aarti Singh 27 5
The bag of words representation I love this movie! It's sweet, but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 6
The bag of words representation I love this movie! It's sweet , but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun … It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 7
The bag of words representation: using a subset of words x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx γ ( )=c xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx recommend xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx slide by Dan Jurafsky 8
The bag of words representation great 2 γ ( )=c love 2 recommend 1 laugh 1 happy 1 ... ... slide by Dan Jurafsky 9
Doc Words Class Training 1 Chinese Beijing Chinese c P ( c ) = N c ˆ 2 Chinese Chinese Shanghai c N 3 Chinese Macao c P ( w | c ) = count ( w , c ) + 1 4 Tokyo Japan Chinese j ˆ Test 5 Chinese Chinese Chinese Tokyo Japan ? count ( c ) + | V | Priors: 3 P ( c )= 4 Choosing a class: 1 P ( j )= P(c|d 5 ) 3/4 * (3/7) 3 * 1/14 * 1/14 4 ∝ ≈ 0.0003 Conditional Probabilities: (5+1) / (8+6) = 6/14 = 3/7 P(Chinese| c ) = (0+1) / (8+6) = 1/14 P(Tokyo| c ) = P(j|d 5 ) 1/4 * (2/9) 3 * 2/9 * 2/9 ∝ (0+1) / (8+6) = 1/14 P(Japan| c ) = ≈ 0.0001 (1+1) / (3+6) = 2/9 P(Chinese| j ) = slide by Dan Jurafsky P(Tokyo| j ) = (1+1) / (3+6) = 2/9 P(Japan| j ) = (1+1) / (3+6) = 2/9 10
Twenty news groups results slide by Barnabás Póczos & Aarti Singh Naïve Bayes: 89% accuracy 11
What if features are continuous? e.g., character recognition: X i is intensity at i th pixel ecognition: i is intensity a Gaussian Naïve Bayes (GNB): � Naïve Bayes (GNB): • Gaussian Naïve Bayes (GNB): � • � • mean and variance for each class k and each pixel i Di ff erent mean and variance for each class k and each pixel i. � • � Sometimes assume variance • slide by Barnabás Póczos & Aarti Singh � • � • • is independent of Y (i.e., σ i ), � • � • • or independent of X i (i.e., σ k ) • or both (i.e., σ ) 12
Estimating parameters: Y discrete, X i continuous tinuous Y discrete, X i continuou ates: slide by Barnabás Póczos & Aarti Singh 13
Estimating parameters: Y discrete, X i continuous tinuous Maximum likelihood estimates: ates: k th class j th training image i th pixel in j th training image slide by Barnabás Póczos & Aarti Singh 14
Case Study: Classifying Mental States 15
Example: GNB for classifying mental states ~1 mm resolution ~2 images per sec. 15,000 voxels/image slide by Barnabás Póczos & Aarti Singh non-invasive, safe measures Blood Oxygen Level Dependent (BOLD) response [Mitchell et al.] 16
� � � � � � � � Brain scans can � � track activation � � with precision and � sensitivity slide by Barnabás Póczos & Aarti Singh � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
Learned Naïve Bayes Models – – Means for P(BrainActivity | WordCategory) Pairwise classification accuracy: [Mitchell et al.] 78-99%, 12 participants Tool words Building Building Tool words words slide by Barnabás Póczos & Aarti Singh 18
What you should know… Naïve Bayes classifier • What’s the assumption • Why we use it • How do we learn it • Why is Bayesian (MAP) estimation important Text classification • Bag of words model Gaussian NB slide by Barnabás Póczos & Aarti Singh • Features are still conditionally independent • Each feature has a Gaussian distribution given class 19
Logistic Regression 20
Last time… Naïve Bayes • NB Assumption: :% • NB Classifier: • Assume parametric form for P(X i |Y) and P(Y) - Estimate parameters using MLE/MAP and plug in slide by Aarti Singh & Barnabás Póczos 21
Gaussian Naïve Bayes (GNB) • There are several distributions that can lead to a linear boundary. • As an example, consider Gaussian Naïve Bayes: Gaussian class conditional densities Gaussian class conditional densities slide by Aarti Singh & Barnabás Póczos • What if we assume variance is independent of class, i.e. .e.%%%%%%%%%%%?% 22
GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 slide by Aarti Singh & Barnabás Póczos 23
GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos 24
GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos { { Constant term First-order term 25
Gaussian Naive Bayes (GNB) Decision(Boundary( Decision Boundary = ( x 1 , x 2 ) X slide by Aarti Singh & Barnabás Póczos = P ( Y = 0) P 1 = P ( Y = 1) P 2 p 1 ( X ) = p ( X | Y = 0) ∼ N ( M 1 , Σ 1 ) p ( X | Y = 1) ∼ N ( M 2 , Σ 2 ) p 2 ( X ) = 26
Generative vs. Discriminative Classifiers • Generative classifiers (e.g. Naïve Bayes) - Assume some functional form for P(X,Y) (or P(X|Y) and P(Y)) - Estimate parameters of P(X|Y), P(Y) directly from training data • But arg max_Y P(X|Y) P(Y) = arg max_Y P(Y|X) • Why not learn P(Y|X) directly? Or better yet, why not learn the decision boundary directly? • Discriminative classifiers (e.g. Logistic Regression) - Assume some functional form for P(Y|X) or for the decision boundary slide by Aarti Singh & Barnabás Póczos - Estimate parameters of P(Y|X) directly from training data 27
Logistic Regression Assumes%the%following%func$onal%form%for%P(Y|X):% Assumes the following functional form for P(Y ∣ X): Logis$c%func$on%applied%to%a%linear% Logistic function applied to linear func$on%of%the%data% function of the data logit%(z)% Logis&c( Logistic func&on( function slide by Aarti Singh & Barnabás Póczos (or(Sigmoid):( (or Sigmoid): z% Features(can(be(discrete(or(con&nuous!( Features can be discrete or continuous! 8% 28
Recommend
More recommend