introduction to data science logistic
play

Introduction to Data Science: Logistic 0 1 1 according to a data - PowerPoint PPT Presentation

Linear models for classication Classier evaluation Linear models for classication Linear models for classication Linear models for classication Linear models for classication Linear models for classication Linear models


  1. Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Summary Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Summary Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation An example classi�cation problem Logistic regression Making predictions Why not linear regression? Multiple logistic regression Classi�cation as probability estimation problem and a multiple logistic regression: Instead we build a linear model of log-odds : As in multiple linear regression it is essential to avoid confounding! . Usually, we do this by minimizing the negative of the log likelihood of the In the credit default case we may want to increase TPR (recall, make Here is how we compute a logistic regression model in R This leads to a natural question: Can we adjust our classifiers TPR and How do we determine how well classifiers are performing? Using these we can define statistics that describe classifier performance We need a more precise language to describe classification mistakes: Interpretation of logistic regression models is slightly different than the As before, we can do hypothesis testing of a relationship between As before, the P-value is the probability of seeing a Z-value as large The general classification setting is: can we predict categorical We require an algorithm required to estimate parameters A way of describing the TPR and FPR tradeoff is by using the ROC Consider an example of single logistic regression of default vs. student For binary (0/1) responses, it's a little better. We approach classification as a class probability estimation problem. Error and accuracy statistics are not enough to understand classifier and As before, the accuracy of as an estimate of the population fit <- glm(default ~ balance + income + student, data=Default, family="binomial") ^ Introduction to Data Science: Logistic β 0 β 1 β 1 according to a data fit criterion. account balance and the probability of default. model. I.e.: solve the following optimization problem FPR? linear regression model we looked at. (e.g., 24.95) under the null hypothesis that there is no relationship curve (Receiver Operating Characteristic) and the AUROC (area under response/output sure we catch all defaults) at the expense of FPR (1-Specificity, clients status: performance. , from set of predictors ? parameter is given its standard error. fit %>% Y X 1 , X 2 , … , X p We could use linear regression in this setting and interpret response One way is to compute the error rate of the classifier, the percent of Logistic regression partition predictor space with linear functions. An individual's choice of transportation mode to commute to work. We can use a learned logistic regression model to make predictions. The basic idea behind logistic regression is to build a linear model This is a classification analog to linear regression: Why can't we use linear regression in the classification setting. Instead of modeling classes 0 or 1 directly, we will model the conditional Regression Name Definition True Class + True Class - Synonyms Total we lose because we think they will default) between balance and the probability of defaulting , i.e., the ROC) fit2 <- glm(default ~ balance + income + student, data=Default, family="binomial") default_fit <- glm(default ~ balance, data=Default, family=binomial) in the p ( x ) Y tidy() β 1 = 0 Remember we are classifying Yes if Classifications can be done using probability cutoffs to trade, e.g., TPR- In this case, the odds that a person defaults increase by In logistic regression we use the Bernoulli probability model we saw As in the regression case, we assume training data mistakes it makes when predicting class as a probability (e.g, if predict ) log = β 0 + β 1 x We can again construct a confidence interval for this estimate as we've related to class probability E.g., "on average, the probability that a person with a balance of $1,000 , since linear regression directly (i.e. , and classify based on this ) In this case, we use a -statistic ^ which plays the role of the t- fit2 %>% tidy() default_fit %>% fit1 <- glm(default ~ student, data=Default, family="binomial") e 0.05 ≈ 1.051 population. − y i f ( x i ) + log(1 + e f ( x i ) ) False Positive Rate (FPR) FP / N Predicted Class + True Positive (TP) β 1 Type-I error, 1-Specificity False Positive (FP) P* ^ 1 − p ( x ) y > 0.5 drugoverdose Logistic regression learns parameter using Maximum Likelihood min β 0 , β 1 ∑ p ( x ) p ( Y = 1| X = x ) p ( x ) = β 0 + β 1 x Predictors: income, cost and time required for each of the alternatives: For categorical responses with more than two values, if order and scale Z for every dollar in their account balance. FPR (ROC curve), or precision-recall (PR curve). previously (think of flipping a coin weighted by Another metric that is frequently used to understand classification errors . In this case, however, responses ), and estimate are done before. doesn't work. probability. defaults is": SE(^ p ( x ) tidy() fit1 %>% tidy() Héctor Corrada Bravo β 1 ) ## # A tibble: 4 x 5 i : y i =1 ( x 1 , y 1 ), … , ( x n , y n ) p ( x ) y i statistic in linear regression: a scaled measure of our estimate (signal / (numerical optimization) Predicted Class - False Negative (FN) True Negative (TN) N* 1 - Type-II error, power, driving/carpooling, biking, taking a bus, taking the train. (units) don't make sense, then it's not a regression problem log = β 0 + β 1 x 1 + ⋯ + β p x p and tradeoffs is the precision-recall curve: parameters to maximize the likelihood of the observed training data categorical and take one of a fixed set of values. P ( Y = Yes | X ) ## # A tibble: 4 x 5 True Positive Rate (TPR) TP / P 1 − p ( x ) ## term estimate std.error statistic where . This is a non-linear (but convex) noise). Area under ROC or PR curve summarize classifier performance across log > 0 ⇒ P ( Y = Yes | X ) > 0.5 sensitivity, recall In general, classification approaches use discriminant (think of scoring ) under this coin flipping (binomial) model. Total P N e ^ β 0 +^ f ( x i ) = β 0 + β 1 x i ## term estimate std.error statistic ## # A tibble: 2 x 5 ## # A tibble: 2 x 5 University of Maryland, College Park, USA β 1 ×1000 e −10.6514+0.0055×1000 P ( Y = No | X ) Response: whether the individual makes their commute by car, bike, bus ## <chr> <dbl> <dbl> <dbl> optimization problem. different cutoffs. functions to do classification. Positive Predictive Value precision , 1-false discovery ^ p (1000) = ≈ ≈ 0.00576 2020-04-05 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> ## term estimate std.error statistic What would happen if we use ? 1 + e β 0 + β 1 ×1000 1 + e −10.6514+0.0055×1000 or train. (PPV) TP / P* ## 1 (Int… -1.09e+1 4.92e-1 -22.1 P ( Y = Yes | X ) > 0.2 proportion ## <chr> <dbl> <dbl> <dbl> <dbl> ## <chr> <dbl> <dbl> <dbl> ## 1 (Int… -1.09e+1 4.92e-1 -22.1 Logistic regression is one way of estimating the class probability ## 2 bala… 5.74e-3 2.32e-4 24.7 ## 2 bala… 5.74e-3 2.32e-4 24.7 ## 1 (Int… -3.50 0.0707 -49.6 0. ## 1 (Int… -1.07e+1 0.361 -29.5 Negative Predicitve Value (also denoted ) ## 3 inco… 3.03e-6 8.20e-6 0.370 (NPV) FN / N* p ( Y = 1| X = x ) p ( x ) ## 3 inco… 3.03e-6 8.20e-6 0.370 ## 2 bala… 5.50e-3 0.000220 25.0 ## 2 stud… 0.405 0.115 3.52 4.31e-4 ## 4 stud… -6.47e-1 2.36e-1 -2.74 32 / 33 30 / 33 23 / 33 33 / 33 31 / 33 26 / 33 10 / 33 14 / 33 25 / 33 18 / 33 19 / 33 24 / 33 27 / 33 17 / 33 16 / 33 15 / 33 22 / 33 13 / 33 20 / 33 28 / 33 12 / 33 29 / 33 21 / 33 11 / 33 9 / 33 1 / 33 2 / 33 7 / 33 8 / 33 3 / 33 4 / 33 5 / 33 6 / 33 ## 4 stud… -6.47e-1 2.36e-1 -2.74 ## # … with 1 more variable: p.value <dbl> ## # … with 1 more variable: p.value <dbl>

  2. Linear models for classi�cation The general classification setting is: can we predict categorical response/output , from set of predictors ? Y X 1 , X 2 , … , X p As in the regression case, we assume training data . In this case, however, responses are ( x 1 , y 1 ), … , ( x n , y n ) y i categorical and take one of a fixed set of values. 1 / 33

  3. Linear models for classi�cation 2 / 33

  4. Linear models for classi�cation An example classi�cation problem An individual's choice of transportation mode to commute to work. Predictors: income, cost and time required for each of the alternatives: driving/carpooling, biking, taking a bus, taking the train. Response: whether the individual makes their commute by car, bike, bus or train. 3 / 33

Recommend


More recommend