5 bayesian decision theory
play

5. Bayesian decision theory Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Fall 2017 5. Bayesian decision theory Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters... I do not grade


  1. Foundatjons of Machine Learning CentraleSupélec — Fall 2017 5. Bayesian decision theory Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

  2. Practjcal maters... ● I do not grade homework that is sent as .docx ● (Partjal) solutjons to Lab 2 are at the end of the slides of Chap 4.

  3. Learning objectjves Afuer this lecture, you should be able to ● Apply Bayes rule for simple inference and decision problems; ● Explain the connectjon between Bayes decision rule , empirical risk minimizatjon , maximum a priori and maximum likelihood; ● Apply the Naive Bayes algorithm. 3

  4. Let's start by tossing coins... 4

  5. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 5

  6. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 6

  7. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ? ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 7

  8. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) ? p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 8

  9. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: ? heads if p 0 > 0.5 , tails otherwise 9

  10. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 10

  11. Classifjcatjon ● Cat vs. dog – Cat = 1 (positjve) Dog good eater – Dog = 0 (negatjve) Cat – x 1 = human contact – x 2 = good eater ● Predictjon: human contact 11

  12. Bayes rule 12

  13. Reverend Thomas Bayes 170?-1761 … possibly 13

  14. Bayes rule 14

  15. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 99% ? 90% ? 10% ? 1% ? 15

  16. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? ? ? 16

  17. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 17

  18. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 0.99 0.0001 ? 18

  19. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 19

  20. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 20

  21. Bayes rule prior likelihood posterior evidence Bayes' decision rule: 21

  22. Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ? ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 22

  23. Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 23

  24. Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: ? test whether the likelihood ratjo Λ( x ) is larger than decision rule: 24

  25. Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: test whether the likelihood ratjo Λ( x ) is larger than decision rule: 25

  26. Example: LRT decision rule ? Assuming the likelihoods below and equal priors, derive a decision rule based on the LRT. 26

  27. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 C=0 C=1 7 27

  28. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) ? C=0 C=1 7 28

  29. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) x < 7 – log(1/2) ≈ 7.69 y=1 is more likely. C=1 C=0 7.69 29

  30. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) 1 ● Bayes decision rule seeks to maximize P(x|y=c) and is hence called the Maximum Likelihood criterion – Decision rule: If Λ ML (x) > 1 then choose y=1 else choose y=0 30

  31. Bayes rule for K > 2 ● Bayes rule: ? ? ● ● What is the decision rule? 31

  32. Bayes rule for K > 2 ● Bayes rule: ● ● Decision ? 32

  33. Bayes rule for K > 2 ● Bayes rule: ● ● Decision 33

  34. Risk minimizatjon 34

  35. Losses and risks ● So far we've assumed all errors were equally costly. But misclassfying a cancer sufgerer as a healthy patjent is much more problematjc than the other way around. ● Actjon α k : assigining class c k ● Loss: quantjfy the cost λ kl of taking actjon α k when the true class is c l ● Expected risk: ● Decision (Bayes Classifjer): 35

  36. Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: 36

  37. Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: ● Defjnes K decision regions x 2 Sports car Engine power Luxury sedan Family car x 1 Price 37

  38. Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. 38

  39. Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. ● This is also a LRT. For 2 classes, let us show that Bayes decision rule is equivalent to: ? 39

  40. 0/1 Loss ● All misclassifjcatjons are equally costly. ● λ kl = 0 if k=l and 1 otherwise ● Minimizing the risk: – choose the most probable class (MAP) – this is equivalent to the Bayes decision rule. 40

  41. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon ? ? 41

  42. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon =1 (equal priors) =1 (0/1 loss) 42

Recommend


More recommend