na ve bayes
play

Nave Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky - PowerPoint PPT Presentation

Classification, Linear Models, Nave Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers Features &


  1. Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein

  2. Today • Text classification problems • and their evaluation • Linear classifiers • Features & Weights • Bag of words • Naïve Bayes

  3. Classification problems

  4. Multiclass Classification Training Testing training data unlabeled ? document label 1 label 2 label 3 label 4 Feature Functions label 1 ? label 2 ? supervised machine Classifier learning algorithm label 3 ? label 4 ?

  5. Is this spam? From: "Fabian Starr“ <Patrick_Freeman@pamietaniepeerelu.pl> Subject: Hey! Sofware for the funny prices! Get the great discounts on popular software today for PC and Macintosh http://iiled.org/Cj4Lmx 70-90% Discounts from retail price!!! All sofware is instantly available to download - No Need Wait!

  6. What is the subject of this article? MeSH Subject Category Hierarchy MEDLINE Article • Antogonists and Inhibitors • Blood Supply • Chemistry ? • Drug Therapy • Embryology • Epidemiology • …

  7. Text Classification • Assigning subject categories, topics, or genres • Spam detection • Authorship identification • Age/gender identification • Language Identification • Sentiment analysis • …

  8. Text Classification: definition • Input : • a document d • a fixed set of classes Y = { y 1 , y 2 ,…, y J } • Output : a predicted class y  Y

  9. Classification Methods: Supervised Machine Learning • Input • a document d • a fixed set of classes Y = { y 1 , y 2 ,…, y J } • a training set of m hand-labeled documents (d 1 ,y 1 ),....,(d m ,y m ) • Output • a learned classifier d  y

  10. Aside: getting examples for supervised learning • Human annotation • By experts or non-experts (crowdsourcing) • Found data • How do we know how good a classifier is? • Compare classifier predictions with human annotation • On held out test examples • Evaluation metrics: accuracy, precision, recall

  11. The 2-by-2 contingency table correct not correct selected tp fp not selected fn tn

  12. Precision and recall • Precision : % of selected items that are correct Recall : % of correct items that are selected correct not correct selected tp fp not selected fn tn

  13. A combined measure: F • A combined measure that assesses the P/R tradeoff is F measure (weighted harmonic mean): b + PR 2 1 ( 1 ) = = F b + P R 1 1 2 a + - a ( 1 ) P R • People usually use balanced F1 measure i.e., with  = 1 (that is,  = ½): • F = 2 PR /( P + R )

  14. Linear Models for Multiclass Classification

  15. Linear Models for Classification Feature function representation Weights

  16. Defining features: Bag of words

  17. Defining features

  18. Linear Classification

  19. Linear Models for Classification Feature function representation Weights

  20. How can we learn weights? • By hand • Probability • e.g.,Naïve Bayes • Discriminative training • e.g., perceptron, support vector machines

  21. Naïve Bayes Models for Text Classification

  22. Generative Story for Multinomial Naïve Bayes • A hypothetical stochastic process describing how training examples are generated

  23. Prediction with Naïve Bayes Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

  24. Prediction with Naïve Bayes Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

  25. Prediction with Naïve Bayes Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

  26. Parameter Estimation • “count and normalize” • Parameters of a multinomial distribution • Relative frequency estimator • Formally: this is the maximum likelihood estimate • See CIML for derivation

  27. Smoothing (add alpha)

  28. Naïve Bayes recap

  29. Why is this model called “Naïve Bayes”? Another view of the same model 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦) = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧)𝑄 𝑌 = 𝑦 𝑍 = 𝑧) 𝑒 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧) 𝑄 𝑌 𝑗 = 𝑦 𝑗 𝑍 = 𝑧) 𝑗=1 Bayes rule + Conditional independence assumption

  30. Today • Text classification problems • and their evaluation • Linear classifiers • Features & Weights • Bag of words • Naïve Bayes

Recommend


More recommend