classification introduction
play

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 - PowerPoint PPT Presentation

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements BVT Review Regression Questions? General Questions? 2 Statistical Learning Supervised Learning Regression Classification


  1. Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1

  2. Announcements • BVT Review • Regression Questions? • General Questions? 2

  3. Statistical Learning • Supervised Learning • Regression • Classification • Unsupervised Learning 3

  4. Classification versus Regression Regression Y ∈ R Classification Y ∈ G = { 1 , 2 , 3 , . . . g } 4

  5. American Bulldog Figure 1: A Good Dog 5

  6. American Bulldog Figure 2: A Good Dog 6

  7. American Bulldog Figure 3: A Good Dog 7

  8. American Bulldog Figure 4: A Good Dog 8

  9. Golden Retriever Figure 5: A Good Dog 9

  10. Golden Retriever Figure 6: A Good Dog 10

  11. Golden Retriever Figure 7: A Good Dog 11

  12. Golden Retriever Figure 8: Some Good Dogs 12

  13. Dog Probabilities Let’s make some assumptions about dogs. Let • Y be the species, either B for American B ulldog, or R for Golden R etriever • W be the weight of the dog, in kg • H be the height of the dog, in cm 0 . 5 = π B = P ( Y = B ) = P ( Y = G ) = π R = 0 . 5 13

  14. Dog Probabilities American B ulldog H | B ∼ N ( µ = 56 , σ 2 = 1 . 5 2 ) W | B ∼ N ( µ = 34 , σ 2 = 2 . 25 2 ) Golden R etriever H | R ∼ N ( µ = 53 , σ 2 = 1 . 5 2 ) W | R ∼ N ( µ = 30 , σ 2 = 0 . 75 2 ) Let’s also assume that W and H are conditionally independent given B or R . (However, this is unrealistic.) 14

  15. Dog Parameters # ht, wt # cm, kg b_mu = c (56, 34) b_sigma = matrix ( c (1.5 ^ 2, 0, 0, 2.25 ^ 2), 2, 2) r_mu = c (53, 32) r_sigma = matrix ( c (1.5 ^ 2, 0, 0, 0.75 ^ 2), 2, 2) 15

  16. Weight Distribution Golden Retriever American Bulldog 0.5 0.4 0.3 Density 0.2 0.1 0.0 28 30 32 34 36 38 40 Weight 16

  17. Bayes Classifier C B ( x ) = argmax P ( Y = k | X = x ) k Decision Boundary x : P ( Y = B | X = x ) = P ( Y = R | X = x ) 17

  18. Weight Bayes, Decision Boundary Golden Retriever American Bulldog 0.5 0.4 0.3 Density 0.2 0.1 0.0 28 30 32 34 36 38 40 Weight 18

  19. Height Distribution Golden Retriever 0.25 American Bulldog 0.20 0.15 Density 0.10 0.05 0.00 48 50 52 54 56 58 60 Height 19

  20. Height Distribution, Decision Boundary Golden Retriever 0.25 American Bulldog 0.20 0.15 Density 0.10 0.05 0.00 48 50 52 54 56 58 60 Height 20

  21. Height and Weight Distribution Golden Retriever American Bulldog 40 Weight 35 30 48 50 52 54 56 58 60 Height 21

  22. Let’s Make Some Dogs sim_dog_data = function(n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) { species = c ( rep ("American Bulldog", n_obs / 2), rep ("Golden Retriever", n_obs / 2)) ht_wt = rbind (mvtnorm:: rmvnorm (n = n_obs / 2, mean = b_mu, sigma = b_sigma), mvtnorm:: rmvnorm (n = n_obs / 2, mean = r_mu, sigma = r_sigma)) data.frame ( species, height = ht_wt[, 1], weight = ht_wt[, 2]) } 22

  23. Let’s Make Some Dogs set.seed (66) dog_trn = sim_dog_data (n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) dog_tst = sim_dog_data (n_obs = 800, b_mu, b_sigma, r_mu, r_sigma) 23

  24. Simulated Dogs, Univariate Density Estimates American Bulldog Golden Retriever height weight 0.5 0.25 0.4 0.20 0.3 0.15 0.2 0.10 0.05 0.1 0.00 0.0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | || | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 45 50 55 60 30 35 40 Feature 24

  25. Simulated Dogs, Bivariate Density Estimates American Bulldog Golden Retriever 34 36 38 38 36 weight 34 34 32 30 30 32 34 60 54 56 58 60 58 56 54 54 height 52 50 48 48 50 52 54 Scatter Plot Matrix 25

  26. Simulated Train Dogs, Decision? Train Data Golden Retriever American Bulldog 38 36 weight 34 32 30 48 50 52 54 56 58 height 26

  27. Simulated Test Dogs, Decision? Test Data 40 Golden Retriever American Bulldog 38 36 weight 34 32 30 28 50 52 54 56 58 60 height 27

  28. Classification Error  y i � = ˆ 1 C ( x )  I ( y i � = ˆ C ( x )) = y i = ˆ 0 C ( x )  n C , Data) = 1 Err(ˆ I ( y i � = ˆ � C ( x i )) n i =1 28

  29. Decision Boundary ? w = 142 . 1745763 − 2 . 0006502 · h C = function(data) { with (data, ifelse (weight > (142.1746 - 2.00065 * height), "American Bulldog", "Golden Retriever")) } # train error mean ( C (dog_trn) != dog_trn$species) ## [1] 0.125 # test error mean ( C (dog_tst) != dog_tst$species) 29 ## [1] 0.11

Recommend


More recommend