Multiple logistic regression Richard Erickson Instructor DataCamp - PowerPoint PPT Presentation

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Multiple logistic regression Richard Erickson Instructor

DataCamp Generalized Linear Models in R Chapter overview Multiple logistic regression Formulas in R Model assumptions

DataCamp Generalized Linear Models in R Why multiple regression? Problem: Multiple predictor variables. Which one should I include? Solution: Include all of them using multiple regression.

DataCamp Generalized Linear Models in R Multiple predictor variables Simple linear models or simple GLM: Limited to 1 Slope and 1 intercept y ∼ β + β x + ϵ 0 1 Multiple regression Multiple slopes and intercepts: y ∼ β + β x + β x + β x … + ϵ 0 1 1 2 3 3

DataCamp Generalized Linear Models in R Too much of a good thing Theoretical maximum number of coefficients: Number of β s = Number samples Over-fitting: Using too many predictors compared to number of samples Practical maximum number of coefficients: Number of β × 10 ≈ Number of samples

DataCamp Generalized Linear Models in R Bus data: Two possible predictors With bus commuter data, 2 possible predictors Number of days one commutes: CommuteDay Distance of commute: MilesOneWay Possible to build a model with both glm(Bus ~ CommuteDay + MilesOneWay, data = bus, family = 'binomial')

DataCamp Generalized Linear Models in R Summary of GLM with multiple predictors Call: glm(formula = Bus ~ CommuteDays + MilesOneWay, family = "binomial", data = bus) Deviance Residuals: Min 1Q Median 3Q Max -1.0732 -0.9035 -0.7816 1.3968 2.5066 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.707515 0.119719 -5.910 3.42e-09 *** CommuteDays 0.066084 0.023181 2.851 0.00436 ** MilesOneWay -0.059571 0.003218 -18.512 < 2e-16 *** #...

DataCamp Generalized Linear Models in R Correlation between predictors

DataCamp Generalized Linear Models in R Order of coefficients No correlation between predictors Order not important y ∼ x + x + ϵ ≈ y ∼ x + x + ϵ 1 2 2 1 Correlation between predictors Order may changes estimates y ∼ x + x + ϵ ≠ y ∼ x + x + ϵ 1 2 2 1

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Let's practice!

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Formulas in R Richard Erickson Instructor

DataCamp Generalized Linear Models in R Why care about formulas for multiple logistic regression? Formulas backbone of regression Tricky to figure out Understanding model.matrix() key

DataCamp Generalized Linear Models in R Slopes Estimates coefficient for continuous variable e.g., height = c(72.3, 21.1, 3.7, 1.0) Formula also requires a global intercept Multiple slopes: Slope for each predictor

DataCamp Generalized Linear Models in R Intercepts Discrete groups used to predict factor or character in R: fish = c("red", "blue")` Single intercept has two options: Reference intercept + contrast: y ~ x Intercept for each group: y ~ x -1

DataCamp Generalized Linear Models in R Multiple intercepts Estimates effect of each group compared to reference group Alphabetically the first Default has one reference group per variable y ~ x1 + x2 Can specify one group to estimate an intercept for all groups y ~ x1+ x2 - 1 First variable has intercept estimated for each group

DataCamp Generalized Linear Models in R Dummy variables Codes group membership Used under the hood (i.e., model.matrix() ) 0s and 1s for each group Example input: color = c("red", "blue") Dummy variables for y ~ colors : intercept = c(1, 1) blue = c(0, 1) Dummy variables for y ~ colors-1 : red = c(1, 0) blue = c(0, 1)

DataCamp Generalized Linear Models in R model.matrix() model.matrix() does legwork for us Foundation for formulas in R > model.matrix( ~ colors) (Intercept) colorsred 1 1 1 2 1 0 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$colors [1] "contr.treatment" Order determined by factor order Change order change with Tidyverse or factor()

DataCamp Generalized Linear Models in R Factor vs numeric caveat R thinks variable is numeric Need to specify factor or character e.g., month = c(1,2,3) e.g., month = factor(c( 1, 2, > month <- c( 1, 2, 3) 3)) > model.matrix( ~ month) (Intercept) month > model.matrix( ~ month) 1 1 1 (Intercept) month2 month3 2 1 2 1 1 0 0 3 1 3 2 1 1 0 attr(,"assign") 3 1 0 1 [1] 0 1 attr(,"assign") [1] 0 1 1 attr(,"contrasts") attr(,"contrasts")$month [1] "contr.treatment"

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Assumptions of multiple logistic regression Richard Erickson Instructor

DataCamp Generalized Linear Models in R Assumptions Limitations also apply to Poisson and other GLMs Important assumptions: Simpson's paradox Linear, monotonic Independence Overdispersion

DataCamp Generalized Linear Models in R Example Simpson's paradox

DataCamp Generalized Linear Models in R Simpson's paradox Key points Missing important predictor Inclusion changes outcome Easy to visualize with lm()

DataCamp Generalized Linear Models in R Simpson's paradox and admission data Admissions data University of California Berkeley Graduate admission Rate of admission by department and gender Does bias exist?

DataCamp Generalized Linear Models in R

DataCamp Generalized Linear Models in R Independence Predictors Response If all independent, order has no effect What is unit of focus? on estimates Individual, groups, group of groups? If non-independent, order can change Test scores estimates Individual student? Teacher? School? District?

DataCamp Generalized Linear Models in R Overdispersion Too many zeros or one (Binomial) Too many zeros, too large variance (Poisson) Variance changes Beyond scope of this course

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Conclusion Richard Erickson Instructor

DataCamp Generalized Linear Models in R What you've learned How GLM extends LM: Poisson Error term Binomial Error term Understanding and plotting results GLM with multiple regression

DataCamp Generalized Linear Models in R Where to from here? DataCamp Multiple (linear) regression course (if you missed it) Extending to include random effects with Hierarchical and mixed-effect models Fit generalized additive models (GAMs) to non-linear models Decide what coefficients to use with model selection such as AIC Many other types of regression Searching and R packages documentation to learn more

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Happy coding!

Multiple logistic regression Richard Erickson Instructor DataCamp - PowerPoint PPT Presentation

DataCamp Generalized Linear Models in R GENERALIZED LINEAR MODELS IN R Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in R Chapter overview Multiple logistic regression Formulas in R Model

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Multiple and Logistic Regression IV Dajiang Liu @PHS 525 Apr-21 st -2016 Review of Last Two

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from

Shelley Hughes Occupational Therapist Senior Product Manager Pearson Clinical Assessment

Review of Regression Analysis Review of Regression Analysis PSYC 575 PSYC 575 Mark Lai Mark

Triennial Tenured/Tenure Track Faculty Salary Equity Study 2016-2017 Committee Marcia

The Ba The Bays Precinct ys Precinct Reference Gr erence Group oup Business Workshop August

Incentives Research in HE Those that found incentives effective Survey Incentives and Institutional

Multi-Reference In-medium Similarity Renormalization Group for the Nuclear Matrix Elements of

Cache Management Improving Memory Locality and Reducing Memory Latency Introduction Memory

Survey Details Emailed out to 30 groups 27 groups replied = 90% response rate 16 groups