going be y ond linear regression
play

Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD - PowerPoint PPT Presentation

Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant Co u rse objecti v es Learn b u ilding blocks of GLMs Chapter 1: Ho w are GLMs an e x tension of linear models


  1. Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  2. Co u rse objecti v es Learn b u ilding blocks of GLMs Chapter 1: Ho w are GLMs an e x tension of linear models Train GLMs Chapter 2: Binomial ( logistic ) regression Interpret model res u lts Chapter 3: Poisson regression Assess model performance Chapter 4: M u lti v ariate logistic regression Comp u te predictions GENERALIZED LINEAR MODELS IN PYTHON

  3. Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON

  4. Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) GENERALIZED LINEAR MODELS IN PYTHON

  5. Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) GENERALIZED LINEAR MODELS IN PYTHON

  6. Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) β - model parameters β - intercept 0 β - slope 1 GENERALIZED LINEAR MODELS IN PYTHON

  7. Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) β - model parameters β - intercept 0 β - slope 1 ϵ - random error GENERALIZED LINEAR MODELS IN PYTHON

  8. LINEAR MODEL - ols() GENERALIZED LINEAR MODEL - glm() from statsmodels.formula.api import ols import statsmodels.api as sm from statsmodels.formula.api import glm model = ols(formula = 'y ~ X', data = my_data).fit() model = glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit GENERALIZED LINEAR MODELS IN PYTHON

  9. Ass u mptions of linear models Regression f u nction E [ y ] = μ = β + β x 0 1 1 Ass u mptions Linear in parameters Errors are independent and normall y distrib u ted Constant v ariance salary = 25790 + 9449 × experience GENERALIZED LINEAR MODELS IN PYTHON

  10. What if ... ? The response is binar y or co u nt → NOT continuous The v ariance of y is not constant → depends on the mean GENERALIZED LINEAR MODELS IN PYTHON

  11. Dataset - nesting of horseshoe crabs Variable Name Description sat N u mber of satellites residing in the nest y There is at least one satellite residing in the nest ; 0/1 weight Weight of the female crab in kg width Width of the female crab in cm color 1 - light medi u m , 2 - medi u m , 3 - dark medi u m , 4 - dark spine 1 - both good , 2 - one w orn or broken , 3 - both w orn or broken 1 A . Agresti , An Introd u ction to Categorical Data Anal y sis , 2007. GENERALIZED LINEAR MODELS IN PYTHON

  12. Linear model and binar y response satellite crab ∼ female crab weight y ~ weight P (satellite crab is present) = P ( y = 1) GENERALIZED LINEAR MODELS IN PYTHON

  13. Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON

  14. Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON

  15. Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON

  16. Linear model and binar y data GENERALIZED LINEAR MODELS IN PYTHON

  17. Linear model and binar y data GENERALIZED LINEAR MODELS IN PYTHON

  18. From probabilities to classes GENERALIZED LINEAR MODELS IN PYTHON

  19. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  20. Ho w to b u ild a GLM ? G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  21. Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON

  22. Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON

  23. Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON

  24. Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON

  25. Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON

  26. Contin u o u s → Linear Regression Data t y pe : contin u o u s Domain : (−∞,∞) E x amples : ho u se price , salar y, person ' s height Famil y : Gaussian() Link : identit y g ( μ ) = μ = E ( y ) Model = Linear regression GENERALIZED LINEAR MODELS IN PYTHON

  27. Binar y → Logistic regression Data t y pe : binar y Domain : 0,1 E x amples : Tr u e / False Famil y : Binomial() Link : logit Model = Logistic regression GENERALIZED LINEAR MODELS IN PYTHON

  28. Co u nt → Poisson regression Data t y pe : co u nt Domain : 0,1,2,...,∞ E x amples : n u mber of v otes , n u mber of h u rricanes Famil y : Poisson() Link : logarithm Model = Poisson regression GENERALIZED LINEAR MODELS IN PYTHON

  29. Link f u nctions Link : η = g ( μ ) glm(family=...) Densit y Defa u lt link η = μ Gaussian() Normal identit y η = log ( μ ) Poisson() Poisson logarithm η = log [ p /(1 − p )] Binomial() Binomial logit η = 1/ μ Gamma() Gamma in v erse 2 In v erse Ga u ssian η = 1/ μ InverseGaussian() in v erse sq u ared GENERALIZED LINEAR MODELS IN PYTHON

  30. Benefits of GLMs A u ni � ed frame w ork for man y di � erent data distrib u tions E x ponential famil y of distrib u tions Link f u nction Transforms the e x pected v al u e of y Enables linear combinations Man y techniq u es from linear models appl y to GLMs as w ell GENERALIZED LINEAR MODELS IN PYTHON

  31. Let ' s practice G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

  32. Ho w to fit a GLM in P y thon ? G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant

  33. statsmodels Importing statsmodels import statsmodels.api as sm S u pport for form u las import statsmodels.formula.api as smf Use glm() directl y from statsmodels.formula.api import glm GENERALIZED LINEAR MODELS IN PYTHON

  34. Process of model fit 1. Describe the model → glm() 2. Fit the model → .fit() 3. S u mmari z e the model → .summary() 4. Make model predictions → .predict() GENERALIZED LINEAR MODELS IN PYTHON

  35. Describing the model FORMULA based ARRAY based from statsmodels.formula.api import glm import statsmodels.api as sm model = glm(formula, data, family) X = sm.add_constant(X) model = sm.glm(y, X, family) GENERALIZED LINEAR MODELS IN PYTHON

  36. Form u la Arg u ment response ∼ explanatory variable(s) output ∼ input(s) formula = 'y ~ x1 + x2' C(x1) : treat x1 as categorical v ariable -1 : remo v e intercept x1:x2 : an interaction term bet w een x1 and x2 x1*x2 : an interaction term bet w een x1 and x2 and the indi v id u al v ariables np.log(x1) : appl y v ectori z ed f u nctions to model v ariables GENERALIZED LINEAR MODELS IN PYTHON

  37. Famil y Arg u ment family = sm.families.____() The famil y f u nctions : Gaussian(link = sm.families.links.identity) → the defa u lt famil y Binomial(link = sm.families.links.logit) probit , cauchy , log , and cloglog Poisson(link = sm.families.links.log) identity and sqrt Other distrib u tion families y o u can re v ie w at statsmodels w ebsite . GENERALIZED LINEAR MODELS IN PYTHON

  38. S u mmari z ing the model print(model_GLM.summary()) GENERALIZED LINEAR MODELS IN PYTHON

  39. Generalized Linear Model Regression Results ============================================================================= Dep. Variable: y No. Observations: 173 Model: GLM Df Residuals: 171 Model Family: Binomial Df Model: 1 Link Function: logit Scale: 1.0000 Method: IRLS Log-Likelihood: -97.226 Date: Mon, 21 Jan 2019 Deviance: 194.45 Time: 11:30:01 Pearson chi2: 165. No. Iterations: 4 Covariance Type: nonrobust ============================================================================= coef std err z P>|z| [0.025 0.975] ----------------------------------------------------------------------------- Intercept -12.3508 2.629 -4.698 0.000 -17.503 -7.199 width 0.4972 0.102 4.887 0.000 0.298 0.697 ============================================================================= GENERALIZED LINEAR MODELS IN PYTHON

  40. Regression coefficients .conf_int(alpha=0.05, cols=None) .params prints regression coe � cients prints con � dence inter v als model_GLM.params model_GLM.conf_int() Intercept -12.350818 width 0.497231 0 1 dtype: float64 Intercept -17.503010 -7.198625 width 0.297833 0.696629 GENERALIZED LINEAR MODELS IN PYTHON

  41. Predictions Specif y all the model v ariables in test data .predict(test_data) comp u tes predictions model_GLM.predict(test_data) 0 0.029309 1 0.470299 2 0.834983 3 0.972363 4 0.987941 GENERALIZED LINEAR MODELS IN PYTHON

  42. Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON

Recommend


More recommend