Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant
Co u rse objecti v es Learn b u ilding blocks of GLMs Chapter 1: Ho w are GLMs an e x tension of linear models Train GLMs Chapter 2: Binomial ( logistic ) regression Interpret model res u lts Chapter 3: Poisson regression Assess model performance Chapter 4: M u lti v ariate logistic regression Comp u te predictions GENERALIZED LINEAR MODELS IN PYTHON
Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 GENERALIZED LINEAR MODELS IN PYTHON
Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) GENERALIZED LINEAR MODELS IN PYTHON
Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) GENERALIZED LINEAR MODELS IN PYTHON
Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) β - model parameters β - intercept 0 β - slope 1 GENERALIZED LINEAR MODELS IN PYTHON
Re v ie w of linear models salary ∼ experience salary = β + β × experience + ϵ 0 1 y = β + β x + ϵ 0 1 1 w here : y - response v ariable ( o u tp u t ) x - e x planator y v ariable ( inp u t ) β - model parameters β - intercept 0 β - slope 1 ϵ - random error GENERALIZED LINEAR MODELS IN PYTHON
LINEAR MODEL - ols() GENERALIZED LINEAR MODEL - glm() from statsmodels.formula.api import ols import statsmodels.api as sm from statsmodels.formula.api import glm model = ols(formula = 'y ~ X', data = my_data).fit() model = glm(formula = 'y ~ X', data = my_data, family = sm.families.____).fit GENERALIZED LINEAR MODELS IN PYTHON
Ass u mptions of linear models Regression f u nction E [ y ] = μ = β + β x 0 1 1 Ass u mptions Linear in parameters Errors are independent and normall y distrib u ted Constant v ariance salary = 25790 + 9449 × experience GENERALIZED LINEAR MODELS IN PYTHON
What if ... ? The response is binar y or co u nt → NOT continuous The v ariance of y is not constant → depends on the mean GENERALIZED LINEAR MODELS IN PYTHON
Dataset - nesting of horseshoe crabs Variable Name Description sat N u mber of satellites residing in the nest y There is at least one satellite residing in the nest ; 0/1 weight Weight of the female crab in kg width Width of the female crab in cm color 1 - light medi u m , 2 - medi u m , 3 - dark medi u m , 4 - dark spine 1 - both good , 2 - one w orn or broken , 3 - both w orn or broken 1 A . Agresti , An Introd u ction to Categorical Data Anal y sis , 2007. GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y response satellite crab ∼ female crab weight y ~ weight P (satellite crab is present) = P ( y = 1) GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y response GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y data GENERALIZED LINEAR MODELS IN PYTHON
Linear model and binar y data GENERALIZED LINEAR MODELS IN PYTHON
From probabilities to classes GENERALIZED LINEAR MODELS IN PYTHON
Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON
Ho w to b u ild a GLM ? G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant
Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON
Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON
Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON
Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON
Components of the GLM GENERALIZED LINEAR MODELS IN PYTHON
Contin u o u s → Linear Regression Data t y pe : contin u o u s Domain : (−∞,∞) E x amples : ho u se price , salar y, person ' s height Famil y : Gaussian() Link : identit y g ( μ ) = μ = E ( y ) Model = Linear regression GENERALIZED LINEAR MODELS IN PYTHON
Binar y → Logistic regression Data t y pe : binar y Domain : 0,1 E x amples : Tr u e / False Famil y : Binomial() Link : logit Model = Logistic regression GENERALIZED LINEAR MODELS IN PYTHON
Co u nt → Poisson regression Data t y pe : co u nt Domain : 0,1,2,...,∞ E x amples : n u mber of v otes , n u mber of h u rricanes Famil y : Poisson() Link : logarithm Model = Poisson regression GENERALIZED LINEAR MODELS IN PYTHON
Link f u nctions Link : η = g ( μ ) glm(family=...) Densit y Defa u lt link η = μ Gaussian() Normal identit y η = log ( μ ) Poisson() Poisson logarithm η = log [ p /(1 − p )] Binomial() Binomial logit η = 1/ μ Gamma() Gamma in v erse 2 In v erse Ga u ssian η = 1/ μ InverseGaussian() in v erse sq u ared GENERALIZED LINEAR MODELS IN PYTHON
Benefits of GLMs A u ni � ed frame w ork for man y di � erent data distrib u tions E x ponential famil y of distrib u tions Link f u nction Transforms the e x pected v al u e of y Enables linear combinations Man y techniq u es from linear models appl y to GLMs as w ell GENERALIZED LINEAR MODELS IN PYTHON
Let ' s practice G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON
Ho w to fit a GLM in P y thon ? G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v ic Done v Data Science Cons u ltant
statsmodels Importing statsmodels import statsmodels.api as sm S u pport for form u las import statsmodels.formula.api as smf Use glm() directl y from statsmodels.formula.api import glm GENERALIZED LINEAR MODELS IN PYTHON
Process of model fit 1. Describe the model → glm() 2. Fit the model → .fit() 3. S u mmari z e the model → .summary() 4. Make model predictions → .predict() GENERALIZED LINEAR MODELS IN PYTHON
Describing the model FORMULA based ARRAY based from statsmodels.formula.api import glm import statsmodels.api as sm model = glm(formula, data, family) X = sm.add_constant(X) model = sm.glm(y, X, family) GENERALIZED LINEAR MODELS IN PYTHON
Form u la Arg u ment response ∼ explanatory variable(s) output ∼ input(s) formula = 'y ~ x1 + x2' C(x1) : treat x1 as categorical v ariable -1 : remo v e intercept x1:x2 : an interaction term bet w een x1 and x2 x1*x2 : an interaction term bet w een x1 and x2 and the indi v id u al v ariables np.log(x1) : appl y v ectori z ed f u nctions to model v ariables GENERALIZED LINEAR MODELS IN PYTHON
Famil y Arg u ment family = sm.families.____() The famil y f u nctions : Gaussian(link = sm.families.links.identity) → the defa u lt famil y Binomial(link = sm.families.links.logit) probit , cauchy , log , and cloglog Poisson(link = sm.families.links.log) identity and sqrt Other distrib u tion families y o u can re v ie w at statsmodels w ebsite . GENERALIZED LINEAR MODELS IN PYTHON
S u mmari z ing the model print(model_GLM.summary()) GENERALIZED LINEAR MODELS IN PYTHON
Generalized Linear Model Regression Results ============================================================================= Dep. Variable: y No. Observations: 173 Model: GLM Df Residuals: 171 Model Family: Binomial Df Model: 1 Link Function: logit Scale: 1.0000 Method: IRLS Log-Likelihood: -97.226 Date: Mon, 21 Jan 2019 Deviance: 194.45 Time: 11:30:01 Pearson chi2: 165. No. Iterations: 4 Covariance Type: nonrobust ============================================================================= coef std err z P>|z| [0.025 0.975] ----------------------------------------------------------------------------- Intercept -12.3508 2.629 -4.698 0.000 -17.503 -7.199 width 0.4972 0.102 4.887 0.000 0.298 0.697 ============================================================================= GENERALIZED LINEAR MODELS IN PYTHON
Regression coefficients .conf_int(alpha=0.05, cols=None) .params prints regression coe � cients prints con � dence inter v als model_GLM.params model_GLM.conf_int() Intercept -12.350818 width 0.497231 0 1 dtype: float64 Intercept -17.503010 -7.198625 width 0.297833 0.696629 GENERALIZED LINEAR MODELS IN PYTHON
Predictions Specif y all the model v ariables in test data .predict(test_data) comp u tes predictions model_GLM.predict(test_data) 0 0.029309 1 0.470299 2 0.834983 3 0.972363 4 0.987941 GENERALIZED LINEAR MODELS IN PYTHON
Let ' s practice ! G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON
Recommend
More recommend