na ve bayes maxent models
play

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , - PowerPoint PPT Presentation

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , 2017 Some slides adapted from 3SLP Announcements: Assignment 1 Due 11:59 AM, Wednesday 9/20 < 2 days Use submit utility with: class id cs473_ferraro assignment id a1 We


  1. Naïve Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , 2017 Some slides adapted from 3SLP

  2. Announcements: Assignment 1 Due 11:59 AM, Wednesday 9/20 < 2 days Use submit utility with: class id cs473_ferraro assignment id a1 We must be able to run it on GL! Common pitfall #1: forgetting files Common pitfall #2: incorrect paths to files Common pitfall #3: 3 rd party libraries

  3. Announcements: Course Project Official handout will be out Wednesday 9/20 Until then, focus on assignment 1 Teams of 1-3 Mixed undergrad/grad is encouraged but not required Some novel aspect is needed Ex 1: reimplement existing technique and apply to new domain Ex 2: reimplement existing technique and apply to new (human) language Ex 3: explore novel technique on existing problem

  4. Recap from last time…

  5. Two Different Philosophical Frameworks prior likelihood probability posterior probability marginal likelihood (probability) Posterior Classification/Decoding Noisy Channel Model Decoding maximum a posteriori there are others too (CMSC 478/678)

  6. Posterior Decoding: Probabilistic Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection … Authorship identification prior class-based likelihood probability of (language model) class class observed observation likelihood (averaged over all classes) data

  7. Noisy Channel Model Decode Rerank hypothesized reweight what you what I want to intent according to actually see tell you “sad stories” what’s likely “The Os lost “sports” “sports” “sports” again…”

  8. Noisy Channel Machine translation Part-of-speech tagging Speech-to-text Morphological analysis Spelling correction Image captioning Text normalization … translation/ (clean) possible decode language (clean) model model output observed observation (noisy) likelihood (noisy) text

  9. Classify or Decode with Bayes Rule

  10. Classify or Decode with Bayes Rule

  11. Classify or Decode with Bayes Rule constant with respect to X

  12. Classify or Decode with Bayes Rule

  13. Classify or Decode with Bayes Rule

  14. Classify or Decode with Bayes Rule

  15. Classify or Decode with Bayes Rule how likely is label X overall? how well does text Y represent label X ?

  16. Classify or Decode with Bayes Rule how likely is label X overall? how well does text Y represent label X ? For “simple” or “flat” labels: * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

  17. Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text (complex output) X ?

  18. Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text can be (complex output) X ? complicated * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

  19. Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text can be (complex output) X ? complicated * iterate through labels * evaluate score for each label, keeping only the best (n best) we’ll come back to this in October * return the best (or n best) label and score

  20. Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ Guessed Not selected/ not guessed

  21. Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ Guessed Not selected/ not guessed Classes/Choices

  22. Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive Guessed (TP) Guessed Correct Not selected/ not guessed Classes/Choices

  23. Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ not guessed Classes/Choices

  24. Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ False Negative not guessed (FN) Guessed Correct Classes/Choices

  25. Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ False Negative True Negative not guessed (FN) (TN) Guessed Guessed Correct Correct Classes/Choices

  26. Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

  27. Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Precision : % of selected items that are correct Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

  28. Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Precision : % of selected items that are correct Recall : % of correct items that are selected Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

  29. A combined measure: F Weighted (harmonic) average of P recision & R ecall

  30. A combined measure: F Weighted (harmonic) average of P recision & R ecall algebra (not important)

  31. A combined measure: F Weighted (harmonic) average of P recision & R ecall Balanced F1 measure: β =1

  32. Sec. 15.2.4 Micro- vs. Macro-Averaging If we have more than one class, how do we combine multiple performance measures into one quantity? Macroaveraging : Compute performance for each class, then average. Microaveraging : Collect decisions for all classes, compute contingency table, evaluate.

  33. Sec. 15.2.4 Micro- vs. Macro-Averaging: Example Class 1 Class 2 Micro Ave. Table Truth Truth Truth Truth Truth Truth : yes : no : yes : no : yes : no Classifier: 10 10 Classifier: 90 10 Classifier: 100 20 yes yes yes Classifier: 10 970 Classifier: 10 890 Classifier: 20 1860 no no no Macroaveraged precision: (0.5 + 0.9)/2 = 0.7 Microaveraged precision: 100/120 = .83 Microaveraged score is dominated by score on common classes

  34. Language Modeling as Naïve Bayes Classifier prior class-based likelihood probability of observed (language model) class class data posterior probability observation likelihood (averaged over all classes) Posterior Classification/Decoding Noisy Channel Model Decoding maximum a posteriori

  35. The Bag of Words Representation

  36. The Bag of Words Representation

  37. The Bag of Words Representation 39

  38. Bag of Words Representation seen 2 classifier sweet 1 γ ( )=c whimsical 1 recommend 1 happy 1 classifier ... ...

  39. Language Modeling as Naïve Bayes Classifier Start with Bayes Rule

  40. Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i

  41. Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i Assume position doesn’t matter

  42. Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i Assume position doesn’t matter Assume the feature probabilities are independent given the class X

  43. Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary

  44. Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P ( c j ) terms For each c j in C do docs j = all docs with class = c j

  45. Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P ( c j ) terms Calculate P ( w k | c j ) terms For each c j in C do Text j = single doc containing all docs j For each word w k in Vocabulary docs j = all docs with class = c j n k = # of occurrences of w k in Text j 𝑞 𝑥 𝑙 | 𝑑 𝑘 = class LM

  46. Naïve Bayes and Language Modeling Naïve Bayes classifiers can use any sort of feature But if, as in the previous slides We use only word features we use all of the words in the text (not a subset) Then Naïve Bayes has an important similarity to language modeling

  47. Sec.13.2.1 Naïve Bayes as a Language Model Positive Model Negative Model 0.1 I 0.2 I 0.1 love 0.001 love 0.01 this 0.01 this 0.05 fun 0.005 fun 0.1 film 0.1 film

  48. Sec.13.2.1 Naïve Bayes as a Language Model Which class assigns the higher probability to s ? Positive Model Negative Model I love this fun film 0.1 I 0.2 I 0.1 love 0.001 love 0.01 this 0.01 this 0.05 fun 0.005 fun 0.1 film 0.1 film

Recommend


More recommend