The Nave Bayes Classifier Machine Learning 1 Todays lecture The - PowerPoint PPT Presentation

The Naïve Bayes Classifier Machine Learning 1

Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 2

Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 3

Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 4

Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 5

MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Posterior probability of label being y for this input x 6

MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 argmax 𝑄(𝑌 = 𝐲) . 7

MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 8

Don’t confuse with MAP learning : MAP prediction finds hypothesis by Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 9

MAP prediction Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 Likelihood of observing this Prior probability of the label input x x when the label is y being y All we need are these two sets of probabilities 10

Example: Tennis again Without any other information, Play tennis P(Play tennis) what is the prior probability that I Yes 0.3 Prior should play tennis? No 0.7 On days that I do play Temperature Wind P(T, W |Tennis = Yes) tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and the wind is W? Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play tennis, what is the Hot Strong 0.4 probability that the Hot Weak 0.1 temperature is T and the wind is W? Cold Strong 0.3 Cold Weak 0.2 11

Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 14

Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 15

Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 P(H, W | No) P(No) = 0.1 £ 0.7 Cold Weak 0.35 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 16

Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 P(H, W | No) P(No) = 0.1 £ 0.7 Cold Weak 0.35 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 MAP prediction = Yes Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 17

How hard is it to learn probabilistic models? O utlook: S(unny), O T H W Play? 1 S H H W - O(vercast), 2 S H H S - R(ainy) 3 O H H W + 4 R M H W + T emperature: H(ot), 5 R C N W + M(edium), 6 R C N S - C(ool) 7 O C N S + 8 S M H W - H umidity: H(igh), 9 S C N W + N(ormal), 10 R M N W + L(ow) 11 S M N S + 12 O M H S + W ind: S(trong), 13 O H N W + W(eak) 14 R M H S - 18

How hard is it to learn probabilistic models? O utlook: S(unny), O T H W Play? 1 S H H W - O(vercast), 2 S H H S - R(ainy) 3 O H H W + We need to learn 4 R M H W + T emperature: H(ot), 5 R C N W + M(edium), 6 R C N S - 1.The prior 𝑄(Play? ) C(ool) 7 O C N S + 2.The likelihoods 𝑄 x Play? ) 8 S M H W - H umidity: H(igh), 9 S C N W + N(ormal), 10 R M N W + L(ow) 11 S M N S + 12 O M H S + W ind: S(trong), 13 O H N W + W(eak) 14 R M H S - 19

How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(x 1 , x 2 , x 3 , x 4 | Play?) 10 R M N W + • (2 4 – 1) parameters in each case 11 S M N S + 12 O M H S + One for each assignment 13 O H N W + 14 R M H S - 20

How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 21

How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 3 3 3 2 Values for this feature 22

How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + • (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1) parameters in 12 O M H S + each case 13 O H N W + 14 R M H S - One for each assignment 3 3 3 2 Values for this feature 23

The Nave Bayes Classifier Machine Learning 1 Todays lecture The - PowerPoint PPT Presentation

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier Learning the nave Bayes Classifier Practical concerns 2 Todays lecture The nave Bayes Classifier Learning the nave Bayes

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Lecture 8: Maximum a Posteriori (MAP) Nave Bayes Classifier Applications Aykut Erdem

Template Attack vs. Bayes Classifier Stjepan Picek 1 Annelie Heuser 2 Sylvain Guilley 2 1 KU

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs.

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Machine Learning and Data Mining 2 : Bayes Classifiers Kalev Kask A basic classifier

Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Precision nuclear physics Observable calculations are becoming increasingly precise Hamiltonian

Objective Bayesian Statistics Jos M. Bernardo Universitat de Valncia, Spain

Political Science 209 - Fall 2018 Probability II Florian Hollenbach 8th November 2018

Bioinformatics: Network Analysis Probabilistic Modeling: Bayesian Networks COMP 572 (BIOS 572 /

Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Learning: A Bayesian solution Dmitry P. Vetrov Research professor at HSE, Head of Bayesian

Classification Algorithms UCSB 293S, 2017. T. Yang Some of slides based on R. Mooney (UT Austin)

The Nave Bayes Classifier Machine Learning 1 Todays lecture The - PowerPoint PPT Presentation

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier Learning the nave Bayes Classifier Practical concerns 2 Todays lecture The nave Bayes Classifier Learning the nave Bayes

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Lecture 8: Maximum a Posteriori (MAP) Nave Bayes Classifier Applications Aykut Erdem

Template Attack vs. Bayes Classifier Stjepan Picek 1 Annelie Heuser 2 Sylvain Guilley 2 1 KU

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs.

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Machine Learning and Data Mining 2 : Bayes Classifiers Kalev Kask A basic classifier

Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Precision nuclear physics Observable calculations are becoming increasingly precise Hamiltonian

Objective Bayesian Statistics Jos M. Bernardo Universitat de Valncia, Spain

Political Science 209 - Fall 2018 Probability II Florian Hollenbach 8th November 2018

Bioinformatics: Network Analysis Probabilistic Modeling: Bayesian Networks COMP 572 (BIOS 572 /

Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos &amp; Alex

Learning: A Bayesian solution Dmitry P. Vetrov Research professor at HSE, Head of Bayesian

Classification Algorithms UCSB 293S, 2017. T. Yang Some of slides based on R. Mooney (UT Austin)

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex