the na ve bayes classifier
play

The Nave Bayes Classifier Machine Learning 1 Todays lecture The - PowerPoint PPT Presentation

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier Learning the nave Bayes Classifier Practical concerns 2 Todays lecture The nave Bayes Classifier Learning the nave Bayes


  1. The Naïve Bayes Classifier Machine Learning 1

  2. Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 2

  3. Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 3

  4. Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 4

  5. Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 5

  6. MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Posterior probability of label being y for this input x 6

  7. MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 argmax 𝑄(𝑌 = 𝐲) . 7

  8. MAP prediction Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 8

  9. Don’t confuse with MAP learning : MAP prediction finds hypothesis by Using the Bayes rule for predicting 𝑧 given an input 𝐲 = 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝐲 𝑄(𝑌 = 𝐲) Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 9

  10. MAP prediction Predict the label 𝑧 for the input 𝐲 using argmax . 𝑄 𝑌 = 𝐲 𝑍 = 𝑧 𝑄 𝑍 = 𝑧 Likelihood of observing this Prior probability of the label input x x when the label is y being y All we need are these two sets of probabilities 10

  11. Example: Tennis again Without any other information, Play tennis P(Play tennis) what is the prior probability that I Yes 0.3 Prior should play tennis? No 0.7 On days that I do play Temperature Wind P(T, W |Tennis = Yes) tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and the wind is W? Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play tennis, what is the Hot Strong 0.4 probability that the Hot Weak 0.1 temperature is T and the wind is W? Cold Strong 0.3 Cold Weak 0.2 11

  12. Example: Tennis again Without any other information, Play tennis P(Play tennis) what is the prior probability that I Yes 0.3 Prior should play tennis? No 0.7 On days that I do play Temperature Wind P(T, W |Tennis = Yes) tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and the wind is W? Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play tennis, what is the Hot Strong 0.4 probability that the Hot Weak 0.1 temperature is T and the wind is W? Cold Strong 0.3 Cold Weak 0.2 12

  13. Example: Tennis again Without any other information, Play tennis P(Play tennis) what is the prior probability that I Yes 0.3 Prior should play tennis? No 0.7 On days that I do play Temperature Wind P(T, W |Tennis = Yes) tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and the wind is W? Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play tennis, what is the Hot Strong 0.4 probability that the Hot Weak 0.1 temperature is T and the wind is W? Cold Strong 0.3 Cold Weak 0.2 13

  14. Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 14

  15. Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 15

  16. Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 P(H, W | No) P(No) = 0.1 £ 0.7 Cold Weak 0.35 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 16

  17. Example: Tennis again Input : Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Yes 0.3 Prior Should I play tennis? No 0.7 argmax y P(H, W | play?) P (play?) Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 P(H, W | No) P(No) = 0.1 £ 0.7 Cold Weak 0.35 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 MAP prediction = Yes Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 17

  18. How hard is it to learn probabilistic models? O utlook: S(unny), O T H W Play? 1 S H H W - O(vercast), 2 S H H S - R(ainy) 3 O H H W + 4 R M H W + T emperature: H(ot), 5 R C N W + M(edium), 6 R C N S - C(ool) 7 O C N S + 8 S M H W - H umidity: H(igh), 9 S C N W + N(ormal), 10 R M N W + L(ow) 11 S M N S + 12 O M H S + W ind: S(trong), 13 O H N W + W(eak) 14 R M H S - 18

  19. How hard is it to learn probabilistic models? O utlook: S(unny), O T H W Play? 1 S H H W - O(vercast), 2 S H H S - R(ainy) 3 O H H W + We need to learn 4 R M H W + T emperature: H(ot), 5 R C N W + M(edium), 6 R C N S - 1.The prior 𝑄(Play? ) C(ool) 7 O C N S + 2.The likelihoods 𝑄 x Play? ) 8 S M H W - H umidity: H(igh), 9 S C N W + N(ormal), 10 R M N W + L(ow) 11 S M N S + 12 O M H S + W ind: S(trong), 13 O H N W + W(eak) 14 R M H S - 19

  20. How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(x 1 , x 2 , x 3 , x 4 | Play?) 10 R M N W + • (2 4 – 1) parameters in each case 11 S M N S + 12 O M H S + One for each assignment 13 O H N W + 14 R M H S - 20

  21. How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 21

  22. How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 3 3 3 2 Values for this feature 22

  23. How hard is it to learn probabilistic models? Prior P(play?) O T H W Play? 1 S H H W - • A single number (Why only one?) 2 S H H S - 3 O H H W + Likelihood P( X | Play?) 4 R M H W + 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + assignment: P(O, T, H, W | Play?) 10 R M N W + 11 S M N S + • (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1) parameters in 12 O M H S + each case 13 O H N W + 14 R M H S - One for each assignment 3 3 3 2 Values for this feature 23

Recommend


More recommend