SI485i : NLP Set 5 Using Nave Bayes Motivation We want to predict - PowerPoint PPT Presentation

SI485i : NLP Set 5 Using Naïve Bayes

Motivation • We want to predict something . • We have some text related to this something. • something = target label Y • text = text features X Given X, what is the most probable Y?

Motivation: Author Detection Alas the day! take heed of him; he stabbed me in mine own house, and that most beastly: in good X = faith, he cares not what mischief he does. If his weapon be out: he will foin like any devil; he will spare neither man, woman, nor child. { Charles Dickens, William Shakespeare, Herman Y = Melville, Jane Austin, Homer, Leo Tolstoy }     arg max ( ) ( | ) Y P Y y P X Y y k k y k

More Motivation P(Y=spam | X=email) P(Y=worthy | X=review sentence)

The Naïve Bayes Classifier ( ) ( | ) P Y P X Y  i j i • Recall Bayes rule: ( | ) P Y X i j ( ) P X j • Which is short for:    ( ) ( | ) P Y y P X x Y y    i j i ( | ) P Y y X x  i j ( ) P X x j • We can re-write this as:    ( ) ( | ) P Y y P X x Y y    i j i ( | ) P Y y X x     i j ( | ) ( ) P X x Y y P Y y j k k k Remaining slides adapted from Tom Mitchell. 5

Deriving Naïve Bayes • Idea: use the training data to directly estimate: and ( | ) ( Y ) P X Y P • We can use these values to estimate using Bayes rule. ( | ) P Y X new • Recall that representing the full joint probability is not practical.   ( | ) ( , , , | ) P X Y P X X X Y 1 2 n 6

Deriving Naïve Bayes • However, if we make the assumption that the attributes are independent, estimation is easy!   1  ( , , | ) ( | ) P X X Y P X Y n i i • In other words, we assume all attributes are conditionally independent given Y. • Often this assumption is violated in practice, but more on that later… 7

Deriving Naïve Bayes  • Let and label Y be discrete. 1  , , X X X n ( | ) ( i ) • Then, we can estimate P X i Y P Y and i directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ? 8

The Naïve Bayes Classifier • Now we have:    ( ) ( | ) P Y y P X Y y   j i j 1  i ( | , , ) P Y y X X     j n ( ) ( | ) P Y y P X Y y k i k k i • To classify a new point X new :      arg max ( ) ( | ) Y P Y y P X Y y new k i k y k i 9

The Naïve Bayes Algorithm • For each value y k • Estimate P(Y = y k ) from the data. • For each value x ij of each attribute X i • Estimate P(X i =x ij | Y = y k ) • Classify a new point via:      arg max ( ) ( | ) Y P Y y P X Y y new k i k y k i • In practice, the independence assumption doesn’t often hold true, but Naïve Bayes performs very well despite it. 10

Naïve Bayes Applications • Text classification • Which e-mails are spam? • Which e-mails are meeting notices? • Which author wrote a document? • Which webpages are about current events? • Which blog contains angry writing? • What sentence in a document talks about company X? • etc. 12

Text and Features   1  ( , , | ) ( | ) P X X Y P X Y n i i • What is X i ? • Could be unigrams, hopefully bigrams too. • It can be anything that is computed from the text X. • Yes, I really mean anything. Creativity and intuition into language is where the real gains come from in NLP. • Non n-gram examples: • X 10 = “the number of sentences that begin with conjunctions” • X 356 = “existence of a semi - colon in the paragraph”

Features • In machine learning, “features” are the attributes to which you assign weights (probabilities in Naïve Bayes) that help in the final classification. • Up until now, your features have been n-grams. You now want to consider other types of features. • You count features just like n-grams. How many did you see? • X = set of features • P(Y|X) = probability of a Y given a set of features

How do you count features? • Feature idea: “a semicolon exists in this sentence” • Count them: • Count(“FEAT - SEMICOLON”, 1) • Make up a unique name for the feature, then count! • Compute probability: • P(“FEAT - SEMICOLON” | author=“dickens”) = Count(“FEAT - SEMICOLON”) / (# dickens sentences)

Authorship Lab 1. Figure out how to use your Language Models from Lab 2. They can be your initial features. • Can you train() a model on one author’s text? 2. P(dickens | text) = P(dickens) * P BigramModel (text) 3. New code for new features. Call your language models, get a probability, and then multiply new feature probabilities.

SI485i : NLP Set 5 Using Nave Bayes Motivation We want to predict - PowerPoint PPT Presentation

SI485i : NLP Set 5 Using Nave Bayes Motivation We want to predict something . We have some text related to this something. something = target label Y text = text features X Given X, what is the most probable Y? Motivation:

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Set 13 Information Extraction Information Extraction Yesterday GM released

SI485i : NLP Set 4 Smoothing Language Models Fall 2013 : Chambers Review: evaluating n-gram

SI485i : NLP Set 2 Probability Review Spring 2015 : Chambers Review of Probability

SI485i : NLP Set 2 Probability Review Fall 2013 : Chambers Review of Probability

SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill

SI485i : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit

SI485i : NLP Set 3 Language Models Fall 2013 : Chambers Language Modeling Which sentence is

SI485i : NLP Set 13 Information Extraction Information Extraction Yesterday GM released

SI485i : NLP Set 6 Sentiment and Opinions It's about finding out what people think... Can be big

SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney

SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning Evaluating CKY How do we

SI485i Natural Language Processing Set 1 Intro to NLP Fall 2013 : Chambers Assumptions about

DSLs in Groovy Say what you mean, Mean what you say Scott Davis Scott Davis What is a Domain

AURIC PACIFIC GROUP LIMITED Annual General Meeting 29 April 2015 1 Financials Overview In $

Description of Coffee Aroma with the Electronic Nose which Learned Wine Aromas, Le Nez du

Question Classification for a Croatian QA System c, Jan Tomislav Lombarovi Snajder, Bojana

Chocolate Nativity Slides and adapted text by: www.mjclaridge.co.uk Chocolate Nativity It was

Subprograms in crme CAraMeL Lecture 9 Formal Languages and Compilers 2011 Nataliia

sts qts rs

Core Philosophy Council Patient- and Advisors Family-Centered Care Treat the patient as an

Sambuz

Useful Links

Newsletter

Mail Us