Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1
Announcements • BVT Review • Regression Questions? • General Questions? 2
Statistical Learning • Supervised Learning • Regression • Classification • Unsupervised Learning 3
Classification versus Regression Regression Y ∈ R Classification Y ∈ G = { 1 , 2 , 3 , . . . g } 4
American Bulldog Figure 1: A Good Dog 5
American Bulldog Figure 2: A Good Dog 6
American Bulldog Figure 3: A Good Dog 7
American Bulldog Figure 4: A Good Dog 8
Golden Retriever Figure 5: A Good Dog 9
Golden Retriever Figure 6: A Good Dog 10
Golden Retriever Figure 7: A Good Dog 11
Golden Retriever Figure 8: Some Good Dogs 12
Dog Probabilities Let’s make some assumptions about dogs. Let • Y be the species, either B for American B ulldog, or R for Golden R etriever • W be the weight of the dog, in kg • H be the height of the dog, in cm 0 . 5 = π B = P ( Y = B ) = P ( Y = G ) = π R = 0 . 5 13
Dog Probabilities American B ulldog H | B ∼ N ( µ = 56 , σ 2 = 1 . 5 2 ) W | B ∼ N ( µ = 34 , σ 2 = 2 . 25 2 ) Golden R etriever H | R ∼ N ( µ = 53 , σ 2 = 1 . 5 2 ) W | R ∼ N ( µ = 30 , σ 2 = 0 . 75 2 ) Let’s also assume that W and H are conditionally independent given B or R . (However, this is unrealistic.) 14
Dog Parameters # ht, wt # cm, kg b_mu = c (56, 34) b_sigma = matrix ( c (1.5 ^ 2, 0, 0, 2.25 ^ 2), 2, 2) r_mu = c (53, 32) r_sigma = matrix ( c (1.5 ^ 2, 0, 0, 0.75 ^ 2), 2, 2) 15
Weight Distribution Golden Retriever American Bulldog 0.5 0.4 0.3 Density 0.2 0.1 0.0 28 30 32 34 36 38 40 Weight 16
Bayes Classifier C B ( x ) = argmax P ( Y = k | X = x ) k Decision Boundary x : P ( Y = B | X = x ) = P ( Y = R | X = x ) 17
Weight Bayes, Decision Boundary Golden Retriever American Bulldog 0.5 0.4 0.3 Density 0.2 0.1 0.0 28 30 32 34 36 38 40 Weight 18
Height Distribution Golden Retriever 0.25 American Bulldog 0.20 0.15 Density 0.10 0.05 0.00 48 50 52 54 56 58 60 Height 19
Height Distribution, Decision Boundary Golden Retriever 0.25 American Bulldog 0.20 0.15 Density 0.10 0.05 0.00 48 50 52 54 56 58 60 Height 20
Height and Weight Distribution Golden Retriever American Bulldog 40 Weight 35 30 48 50 52 54 56 58 60 Height 21
Let’s Make Some Dogs sim_dog_data = function(n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) { species = c ( rep ("American Bulldog", n_obs / 2), rep ("Golden Retriever", n_obs / 2)) ht_wt = rbind (mvtnorm:: rmvnorm (n = n_obs / 2, mean = b_mu, sigma = b_sigma), mvtnorm:: rmvnorm (n = n_obs / 2, mean = r_mu, sigma = r_sigma)) data.frame ( species, height = ht_wt[, 1], weight = ht_wt[, 2]) } 22
Let’s Make Some Dogs set.seed (66) dog_trn = sim_dog_data (n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) dog_tst = sim_dog_data (n_obs = 800, b_mu, b_sigma, r_mu, r_sigma) 23
Simulated Dogs, Univariate Density Estimates American Bulldog Golden Retriever height weight 0.5 0.25 0.4 0.20 0.3 0.15 0.2 0.10 0.05 0.1 0.00 0.0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | || | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 45 50 55 60 30 35 40 Feature 24
Simulated Dogs, Bivariate Density Estimates American Bulldog Golden Retriever 34 36 38 38 36 weight 34 34 32 30 30 32 34 60 54 56 58 60 58 56 54 54 height 52 50 48 48 50 52 54 Scatter Plot Matrix 25
Simulated Train Dogs, Decision? Train Data Golden Retriever American Bulldog 38 36 weight 34 32 30 48 50 52 54 56 58 height 26
Simulated Test Dogs, Decision? Test Data 40 Golden Retriever American Bulldog 38 36 weight 34 32 30 28 50 52 54 56 58 60 height 27
Classification Error y i � = ˆ 1 C ( x ) I ( y i � = ˆ C ( x )) = y i = ˆ 0 C ( x ) n C , Data) = 1 Err(ˆ I ( y i � = ˆ � C ( x i )) n i =1 28
Decision Boundary ? w = 142 . 1745763 − 2 . 0006502 · h C = function(data) { with (data, ifelse (weight > (142.1746 - 2.00065 * height), "American Bulldog", "Golden Retriever")) } # train error mean ( C (dog_trn) != dog_trn$species) ## [1] 0.125 # test error mean ( C (dog_tst) != dog_tst$species) 29 ## [1] 0.11
Recommend
More recommend