Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression - PowerPoint PPT Presentation

Lecture 9: − Naïve Bayes Classifier (cont’d.) − Logistic Regression − Discriminative vs. Generative Classification − Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University

Last time… Naïve Bayes Classifier – Given : – ,… – Class prior P(Y) – d conditionally independent features X 1 ,… X d given the – class label Y – For each X i feature, we have the conditional likelihood P(X i |Y) Naïve Bayes Decision rule: slide by Barnabás Póczos & Aarti Singh 2

Last time… Naïve Bayes Algorithm for discrete features discrete features We need to estimate these probabilities! Estimators For Class Prior For Likelihood slide by Barnabás Póczos & Aarti Singh NB Prediction for test data: 19 3

Last time… Text Classification MEDLINE Article MeSH Subject Category   Hierarchy • Antogonists and Inhibitors • Blood Supply ? • Chemistry • Drug Therapy • Embryology • Epidemiology • … slide by Dan Jurafsky 4

Last time… Bag of words model Typical additional assumption: Position in ¡document ¡doesn’t ¡matter : P(X i =x i |Y=y) = P(X k =x i |Y=y) – “Bag ¡of ¡words” ¡model ¡– order of words on the page ignored The document is just a bag of words: i.i.d. words – Sounds really silly, but often works very well! ) K( 50000-1) parameters to estimate The probability of a document with words x 1 ,x 2 ,… ¡ slide by Barnabás Póczos & Aarti Singh 27 5

The bag of words representation I love this movie! It's sweet, but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 6

The bag of words representation I love this movie! It's sweet , but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun … It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 7

The bag of words representation: using a subset of words x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx γ ( )=c xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx recommend xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx slide by Dan Jurafsky 8

The bag of words representation great 2 γ ( )=c love 2 recommend 1 laugh 1 happy 1 ... ... slide by Dan Jurafsky 9

Doc Words Class Training 1 Chinese Beijing Chinese c P ( c ) = N c ˆ 2 Chinese Chinese Shanghai c N 3 Chinese Macao c P ( w | c ) = count ( w , c ) + 1 4 Tokyo Japan Chinese j ˆ Test 5 Chinese Chinese Chinese Tokyo Japan ? count ( c ) + | V | Priors: 3 P ( c )= 4 Choosing a class: 1 P ( j )= P(c|d 5 ) 3/4 * (3/7) 3 * 1/14 * 1/14 4 ∝ ≈ 0.0003 Conditional Probabilities: (5+1) / (8+6) = 6/14 = 3/7 P(Chinese| c ) = (0+1) / (8+6) = 1/14 P(Tokyo| c ) = P(j|d 5 ) 1/4 * (2/9) 3 * 2/9 * 2/9 ∝ (0+1) / (8+6) = 1/14 P(Japan| c ) = ≈ 0.0001 (1+1) / (3+6) = 2/9 P(Chinese| j ) = slide by Dan Jurafsky P(Tokyo| j ) = (1+1) / (3+6) = 2/9 P(Japan| j ) = (1+1) / (3+6) = 2/9 10

Twenty news groups results slide by Barnabás Póczos & Aarti Singh Naïve Bayes: 89% accuracy 11

What if features are continuous? e.g., character recognition: X i is intensity at i th pixel ecognition: i is intensity a Gaussian Naïve Bayes (GNB): � Naïve Bayes (GNB): • Gaussian Naïve Bayes (GNB): � • � • mean and variance for each class k and each pixel i Di ff erent mean and variance for each class k and each pixel i. � • � Sometimes assume variance • slide by Barnabás Póczos & Aarti Singh � • � • • is independent of Y (i.e., σ i ), � • � • • or independent of X i (i.e., σ k ) • or both (i.e., σ ) 12

Estimating parameters:   Y discrete, X i continuous tinuous Y discrete, X i continuou ates: slide by Barnabás Póczos & Aarti Singh 13

Estimating parameters:   Y discrete, X i continuous tinuous Maximum likelihood estimates: ates: k th class j th training image i th pixel in j th training image slide by Barnabás Póczos & Aarti Singh 14

Case Study:   Classifying Mental States 15

Example: GNB for classifying mental states ~1 mm resolution ~2 images per sec. 15,000 voxels/image slide by Barnabás Póczos & Aarti Singh non-invasive, safe measures Blood Oxygen   Level Dependent (BOLD)   response [Mitchell et al.] 16

� � � � � � � � Brain scans can � � track activation � � with precision and � sensitivity slide by Barnabás Póczos & Aarti Singh � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Learned Naïve Bayes Models   – – Means for P(BrainActivity | WordCategory) Pairwise classification accuracy:   [Mitchell et al.] 78-99%, 12 participants Tool words Building Building Tool words words slide by Barnabás Póczos & Aarti Singh 18

What you should know… Naïve Bayes classifier • What’s the assumption • Why we use it • How do we learn it • Why is Bayesian (MAP) estimation important   Text classification • Bag of words model Gaussian NB slide by Barnabás Póczos & Aarti Singh • Features are still conditionally independent • Each feature has a Gaussian distribution given class 19

Logistic Regression 20

      Last time… Naïve Bayes • NB Assumption:   :% • NB Classifier:   • Assume parametric form for P(X i |Y) and P(Y) - Estimate parameters using MLE/MAP and plug in slide by Aarti Singh & Barnabás Póczos 21

      Gaussian Naïve Bayes (GNB) • There are several distributions that can lead to a linear boundary. • As an example, consider Gaussian Naïve Bayes:   Gaussian class conditional densities Gaussian class conditional densities slide by Aarti Singh & Barnabás Póczos • What if we assume variance is independent of class, i.e. .e.%%%%%%%%%%%?% 22

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 slide by Aarti Singh & Barnabás Póczos 23

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos 24

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos { { Constant term First-order term 25

Gaussian Naive Bayes (GNB) Decision(Boundary( Decision Boundary = ( x 1 , x 2 ) X slide by Aarti Singh & Barnabás Póczos = P ( Y = 0) P 1 = P ( Y = 1) P 2 p 1 ( X ) = p ( X | Y = 0) ∼ N ( M 1 , Σ 1 ) p ( X | Y = 1) ∼ N ( M 2 , Σ 2 ) p 2 ( X ) = 26

Generative vs. Discriminative Classifiers • Generative classifiers (e.g. Naïve Bayes) - Assume some functional form for P(X,Y) (or P(X|Y) and P(Y)) - Estimate parameters of P(X|Y), P(Y) directly from training data • But arg max_Y P(X|Y) P(Y) = arg max_Y P(Y|X) • Why not learn P(Y|X) directly? Or better yet, why not learn the decision boundary directly? • Discriminative classifiers (e.g. Logistic Regression) - Assume some functional form for P(Y|X) or for the decision boundary slide by Aarti Singh & Barnabás Póczos - Estimate parameters of P(Y|X) directly from training data 27

Logistic Regression Assumes%the%following%func$onal%form%for%P(Y|X):% Assumes the following functional form for P(Y ∣ X): Logis$c%func$on%applied%to%a%linear% Logistic function applied to linear func$on%of%the%data% function of the data logit%(z)% Logis&c( Logistic   func&on( function   slide by Aarti Singh & Barnabás Póczos (or(Sigmoid):( (or Sigmoid): z% Features(can(be(discrete(or(con&nuous!( Features can be discrete or continuous! 8% 28

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression - PowerPoint PPT Presentation

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs. Generative Classification Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University Last time Nave Bayes Classifier

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression - PowerPoint PPT Presentation

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs. Generative Classification Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University Last time Nave Bayes Classifier

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI &amp; Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture