m u lti v ariable logistic regression
play

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR - PowerPoint PPT Presentation

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant M u lti v ariable setting Model form u la logit( y ) = + x 0 1 1 GENERALIZED LINEAR MODELS IN


  1. M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  2. M u lti v ariable setting Model form u la logit( y ) = β + β x 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  3. M u lti v ariable setting Model form u la logit( y ) = β + β x 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  4. M u lti v ariable setting Model form u la logit( y ) = β + β x + β x + ... + β x 0 1 1 2 2 p p GENERALIZED LINEAR MODELS IN PYTHON

  5. M u lti v ariable setting Model form u la logit( y ) = β + β x + β x + ... + β x 0 1 1 2 2 p p In P y thon model = glm('y ~ x1 + x2 + x3 + x4', data = my_data, family = sm.families.Binomial()).fit() GENERALIZED LINEAR MODELS IN PYTHON

  6. E x ample - w ell s w itching formula = 'switch ~ distance100 + arsenic' wells_fit = glm(formula = formula, data = wells, family = sm.families.Binomial()).fit() =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- Intercept 0.0027 0.079 0.035 0.972 -0.153 0.158 distance100 -0.8966 0.104 -8.593 0.000 -1.101 -0.692 arsenic 0.4608 0.041 11.134 0.000 0.380 0.542 =============================================================================== GENERALIZED LINEAR MODELS IN PYTHON

  7. E x ample - w ell s w itching coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- Intercept 0.0027 0.079 0.035 0.972 -0.153 0.158 distance100 -0.8966 0.104 -8.593 0.000 -1.101 -0.692 arsenic 0.4608 0.041 11.134 0.000 0.380 0.542 Both coe � cients are statisticall y signi � cant Sign of coe � cients logical A u nit - change in distance100 corresponds to a negati v e di � erence of 0.89 in the logit A u nit - change in arsenic corresponds to a positi v e di � erence of 0.46 in the logit GENERALIZED LINEAR MODELS IN PYTHON

  8. Impact of adding a v ariable Impact of arsenic v ariable coef std err --------------------------------- distance100 changes from -0.62 to -0.89 Intercept 0.0027 0.079 distance100 -0.8966 0.104 F u rther a w a y from the safe w ell arsenic 0.4608 0.041 More likel y to ha v e higher arsenic le v els coef std err --------------------------------- Intercept 0.6060 0.060 distance100 -0.6291 0.097 GENERALIZED LINEAR MODELS IN PYTHON

  9. M u lticollinearit y Variables that are correlated w ith other model v ariables Increase in standard errors of coe � cients Coe � cients ma y not be statisticall y signi � cant 1 h � ps :// en .w ikipedia . org /w iki / Correlation _ and _ dependence GENERALIZED LINEAR MODELS IN PYTHON

  10. Presence of m u lticollinearit y? What to look for ? Coe � cient is not signi � cant , b u t v ariable is highl y correlated w ith y Adding / remo v ing a v ariable signi � cantl y changes coe � cients Not logical sign of the coe � cient Variables ha v e high pair w ise correlation GENERALIZED LINEAR MODELS IN PYTHON

  11. Variance inflation factor ( VIF ) Most w idel y u sed diagnostic for m u lticollinearit y Comp u ted for each e x planator y v ariable Ho w in � ated the v ariance of the coe � cient is S u ggested threshold VIF > 2.5 In P y thon from statsmodels.stats.outliers_influence import variance_inflation_factor GENERALIZED LINEAR MODELS IN PYTHON

  12. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  13. Comparing models G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  14. De v iance Form u la D = −2 LL ( β ) Meas u re of error Lo w er de v iance → be � er model � t Benchmark for comparison is the n u ll de v iance → intercept - onl y model E v al u ate Adding a random noise v ariable w o u ld , on a v erage , decrease de v iance b y 1 Adding p predictors to the model de v iance sho u ld decrease b y more than p GENERALIZED LINEAR MODELS IN PYTHON

  15. De v iance in P y thon GENERALIZED LINEAR MODELS IN PYTHON

  16. Comp u te de v iance E x tract n u ll - de v iance and de v iance Comp u te de v iance u sing log likelihood # Extract null deviance print(-2*model.llf) print(model.null_deviance) 4076.2378 4118.0992 Red u ction in de v iance b y 41.86 # Extract model deviance Incl u ding distance100 impro v ed the � t print(model.deviance) 4076.2378 GENERALIZED LINEAR MODELS IN PYTHON

  17. Model comple x it y model_1 and model_2 , w here L 1 > L 2 N u mber of parameters higher in model_2 model_2 is o v er � � ing GENERALIZED LINEAR MODELS IN PYTHON

  18. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  19. Model form u la G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  20. Form u la and model matri x GENERALIZED LINEAR MODELS IN PYTHON

  21. Form u la and model matri x GENERALIZED LINEAR MODELS IN PYTHON

  22. Form u la and model matri x GENERALIZED LINEAR MODELS IN PYTHON

  23. Form u la and model matri x GENERALIZED LINEAR MODELS IN PYTHON

  24. Model matri x Model matri x: y ∼ X Check model matri x str u ct u re from patsy import dmatrix Model form u la dmatrix('x1 + x2') 'y ~ x1 + x2' Intercept x1 x2 1 1 4 1 2 5 1 3 6 GENERALIZED LINEAR MODELS IN PYTHON

  25. Variable transformation import numpy as np 'y ~ x1 + np.log(x2)' dmatrix('x1 + np.log(x2)') DesignMatrix with shape (3, 3) Intercept x1 np.log(x2) 1 1 1.38629 1 2 1.60944 1 3 1.79176 GENERALIZED LINEAR MODELS IN PYTHON

  26. Centering and standardi z ation Statef u l transforms 'y ~ center(x1) + standardize(x2)' dmatrix('center(x1) + standardize(x2)') DesignMatrix with shape (3, 3) Intercept center(x1) standardize(x2) 1 -1 -1.22474 1 0 0.00000 1 1 1.22474 GENERALIZED LINEAR MODELS IN PYTHON

  27. B u ild y o u r o w n transformation def my_transformation(x): return 4 * x dmatrix('x1 + x2 + my_transformation(x2)') DesignMatrix with shape (3, 4) Intercept x1 x2 my_transformation(x2) 1 1 4 16 1 2 5 20 1 3 6 24 GENERALIZED LINEAR MODELS IN PYTHON

  28. Arithmetic operations x1 = np.array([1, 2, 3]) x1 = [1, 2, 3] x2 = np.array([4,5,6]) x2 = [4,5,6] dmatrix('I(x1 + x2')) dmatrix('I(x1 + x2)') DesignMatrix with shape (3, 2) DesignMatrix with shape (6, 2) Intercept I(x1 + x2) Intercept I(x1 + x2) 1 5 1 1 1 7 1 2 1 9 1 3 1 4 1 5 1 6 GENERALIZED LINEAR MODELS IN PYTHON

  29. Coding the categorical data GENERALIZED LINEAR MODELS IN PYTHON

  30. Coding the categorical data GENERALIZED LINEAR MODELS IN PYTHON

  31. Coding the categorical data GENERALIZED LINEAR MODELS IN PYTHON

  32. Pats y coding Strings and booleans are a u tomaticall y coded N u merical → categorical C() f u nction Reference gro u p Defa u lt : � rst gro u p Treatment levels GENERALIZED LINEAR MODELS IN PYTHON

  33. The C () f u nction N u meric v ariable Ho w man y le v els ? dmatrix('color', data = crab) crab['color'].value_counts() DesignMatrix with shape (173, 2) 2 95 Intercept color 3 44 1 2 4 22 1 3 1 12 1 1 [... rows omitted] GENERALIZED LINEAR MODELS IN PYTHON

  34. The C () f u nction Categorical v ariable dmatrix('C(color)', data = crab) DesignMatrix with shape (173, 4) Intercept C(color)[T.2] C(color)[T.3] C(color)[T.4] 1 1 0 0 1 0 1 0 1 0 0 0 [... rows omitted] GENERALIZED LINEAR MODELS IN PYTHON

  35. Changing the reference gro u p dmatrix('C(color, Treatment(4))', data = crab) DesignMatrix with shape (173, 4) Intercept C(color)[T.1] C(color)[T.2] C(color)[T.3] 1 0 1 0 1 0 0 1 1 1 0 0 [... rows omitted] GENERALIZED LINEAR MODELS IN PYTHON

  36. Changing the reference gro u p l = [1, 2, 3,4] dmatrix('C(color, levels = l)', data = crab) DesignMatrix with shape (173, 4) Intercept C(color)[T.2] C(color)[T.3] C(color)[T.4] 1 1 0 0 1 0 1 0 1 0 0 0 [... rows omitted] GENERALIZED LINEAR MODELS IN PYTHON

  37. M u ltiple intercepts 'y ~ C(color)-1' dmatrix('C(color)-1', data = crab) DesignMatrix with shape (173, 4) C(color)[1] C(color)[2] C(color)[3] C(color)[4] 0 1 0 0 0 0 1 0 1 0 0 0 [... rows omitted] GENERALIZED LINEAR MODELS IN PYTHON

  38. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  39. Categorical and interaction terms G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

Recommend


More recommend