logistic regression mle vs ols3 in excel2013 25 aug 2016
play

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H - PDF document

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 1 Schield MLE vs. OLS3-Based Logistic Excel 2013 2 Logistic Regression: Background & Goals MLE vs. OLS3 in Excel 2013


  1. Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 1 Schield MLE vs. OLS3-Based Logistic Excel 2013 2 Logistic Regression: Background & Goals MLE vs. OLS3 in Excel 2013 Doing logistic regression properly requires MLE. by Doing MLE in Excel is not easy. See Schield 2014a Milo Schield Schield has identified three OLS shortcuts: Member: International Statistical Institute OLS1: Model Ln(Odds(p)) where p is near 0 or 1 US Rep: International Statistical Literacy Project OLS2: Model Ln(Odds(p)) where p is grouped data Director, W. M. Keck Statistical Literacy Project OLS3: Use OLS to estimate logistic parameters. Slides and data at: www.StatLit.org/ These slides compare OLS3 with MLE. pdf/2014-Schield-Logistic-MLE-OLS3-Excel2013-slides.pdf Schield (2014b) presents the OLS3-based approach. Excel/2014-Schield-Logistic-MLE-OLS3-Excel2013.xlsx V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 3 Schield MLE vs. OLS3-Based Logistic Excel 2013 4 Model Gender by Height (OLS) Predict Gender using Height Must use logistic regression Source: Minitab Pulse dataset This trend-line does not satisfy the least-squares assumptions and it goes outside the valid range. but it does pass through the joint mean. V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 5 Schield MLE vs. OLS3-Based Logistic Excel 2013 6 1a: MLE Logistic Regression 1b: MLE Logistic Regression of Gender by Height vs OLS(Y|X) of Gender on Height Different slopes but they intersect near P(Y) = 0.5 MLE Logistic Excel 2013 Gender by Height MLE Logistic and Excel 2013 Gender by Height 1 OLS Gender on Height Men 100% Probability (Male) MLE Logistic regression 0.75 P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} Based on MLE logistic regression a = ‐53.32 ; b = 0.7905. Probability (Male) 75% P(Male) = 0.5 when X = 67.4529 y = 0.0953x ‐ 5.9282 0.5 X = Xo if P(Y) = 0.5 Xo = ‐a/b = 67.4529 Slope (X = 67.45) = 0.1976 R² = 0.5102 Slope(X=Xo) = b/4 = 0.1976 50% 0.5 = 0.0953*Xc ‐ 5.982 0.25 Xc = (0.5 + 5.982)/ 0.0953 25% Xc = 67.4626 Women 0 Slope = 0.0953 60 62 64 66 68 70 72 74 76 0% Height (inches) 60 62 64 66 68 70 72 74 76 Pulse.xls Schield Height (inches) Pulse.xls Schield 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 1

  2. Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 7 Schield MLE vs. OLS3-Based Logistic Excel 2013 8 1c: MLE Logistic Regression 1d: Intersection from OLS(Y|X) vs OLS(X|Y) of Height on Gender Get slope from OLS(X|Y) Similar slopes near P(Y) = 0.5 Very nice estimate! MLE Logistic and OLS Excel 2013 Excel 2013 Gender by Height Logistic: Gender by Height: MLE vs OLS‐Based of Height by Gender 1 1 Men Men MLE logistic regression Probability (Male) Probability (Male) 0.8 Estimate: Dashed line Ave Height of Men: 70.75" 0.75 P(Male) = 0.5 when X = 67.4529 Ave Ht of All: 68.72" Slope (X = 67.45) = 0.1976 0.6 62% are men P = 1/{1+exp[4(Xo-X)/(M2-M1]} 0.5 0.4 P = 1/{1+exp[4(67.45-X)/(5.35)]} Ave Height of OLS2 Height|Gender 0.25 MLE: Solid line 0.2 Women 65.40" Means difference: 5.35 Women Women Slope = 1/5.35 = 0.1868 0 0 60 62 64 66 68 70 72 74 76 60 62 64 66 68 70 72 74 76 Height (inches) Height (inches) Schield Pulse.xls Pulse.xls Schield V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 9 Schield MLE vs. OLS3-Based Logistic Excel 2013 10 2a: MLE Logistic Regression 2b: MLE Logistic Regression of Gender by Weight. vs OLS(Y|X) of Gender on Weight . Different slopes but they intersect near P(Y) = 0.5 MLE Logistic Excel 2013 Gender by Weight MLE Logistic and Excel 2013 1 Gender by Weight OLS Gender on Weight Men Probability (Male) MLE Logistic regression 100% 0.75 P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} MLE logistic regression a = ‐21.48 ; b = 0.1577. 75% Probability (Male) P(Male) = 0.5 when X = 136.22# y = 0.0146x ‐ 1.4956 0.5 X = Xo if P(X) = 0.5 Xo = ‐a/b = 136.218 Slope (X=136.22) = 0.0394 R² = 0.5022 Slope(X=Xo) = b/4 = 0.0394 50% 0.5 = 0.0146*Xc ‐ 1.4956 0.25 Xc = (0.5 + 1.4956)/ 0.0146 Women 25% Xc = 136.7 0 OLS Slope = 0.0146 90 115 140 165 190 215 0% Weight (pounds) 100 110 120 130 140 150 160 170 Pulse.xls Schield Weight (pounds) Pulse.xls Schield V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 11 Schield MLE vs. OLS3-Based Logistic Excel 2013 12 2c: MLE Logistic Regression 2d: Intersection from OLS(Y|X) vs OLS(X|Y) of Weight on Gender Get slope from OLS(X|Y) Similar slopes near P(Y) = 0.5 Fairly good estimate MLE Logistic and OLS Excel 2013 Excel 2013 Gender by Weight Logistic: Gender by Weight: MLE vs OLS‐Based of Weight by Gender 1.00 1 Men Men MLE logistic regression Probability (Male) Probability (Male) MLE: Solid line 0.75 Ave Weight Men: 158.26 0.75 P(Male) = 0.5 when X = 136.22# Estimate: Dashed line Slope (X=136.22) = 0.0394 Ave Wt of All: 145.15# 0.50 62% are men 0.5 Ave Weight of OLS Weight|Gender 0.25 Women 123.80# 0.25 Means difference: 34.46 Slope = 1/34.46 = 0.0290 Women Women 0.00 0 110 120 130 140 150 160 170 100 110 120 130 140 150 160 170 180 Weight (pounds) Weight (pounds) Pulse.xls Schield Pulse.xls Schield 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 2

  3. Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 13 Schield MLE vs. OLS3-Based Logistic Excel 2013 14 3a: MLE Logistic Regression 3b: MLE Logistic Regression of Gender by Rest Pulse vs OLS(Y|X) of Gender on Pulse1 MLE and OLS1 match near P(Y|X) = 0.5 Almost flat! Discriminatory power is weak. MLE Logistic and MLE Logistic Excel 2013 Excel 2013 Gender by Rest Pulse Gender by Rest Pulse OLS Gender on Pulse1 100% 1 Men Probability (Male) P(Y|X) = 1 / {1 + Exp[‐(a + bX)]} 75% 0.75 Probability (Male) y = ‐0.0127x + 1.5417 a = 4.61; b = ‐0.0561. R² = 0.0814 Slope(X=Xo) = b/4 = ‐0.0140 X = Xo if P(X) = 0.5 Xo = ‐a/b = 82.22 bpm 50% P(Y) = 0.5 if X = 82.32 0.5 MLE logistic regression Slope = ‐0.0127 P(Male) = 0.5 when Xo = 82.22 bpm 0.25 25% Slope = ‐0.0140 MLE Logistic regression Women 0 0% 40 50 60 70 80 90 100 110 120 40 50 60 70 80 90 100 110 120 Rest Pulse (bpm) Rest Pulse (bpm) Pulse.xls Schield Pulse.xls Schield V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 15 Schield MLE vs. OLS3-Based Logistic Excel 2013 16 3c: MLE Logistic Regression 3d: Intersection from OLS(Y|X); vs OLS(X|Y) of Gender on Pulse1 Get slope from OLS(X|Y) Very dissimilar slopes near P(Y) = 0.5 Very bad estimate MLE Logistic and OLS Excel 2013 Gender by Rest Pulse Excel 2013 Logistic: Gender by Pulse1 : MLE vs OLS‐Based of Rest Pulse by Gender 1 1 Men Ave rest pulse of Men: 70.42 Men Probability (Male) 0.8 Ave rest pulse of All: 72.87 bpm Probability (Male) P = 1/{1+exp[4(Xo-X)/(M2-M1]} 0.75 62% are men Estimate: Dashed line MLE logistic regression 0.6 P = 1/{1+exp[4(82.32-X)/(-6.44)]} Slope = ‐0.0140 MLE: Solid line 0.5 0.4 OLS Pulse1|Gender Ave rest pulse of Means difference: ‐6.44 0.25 0.2 Women 76.86 bpm Slope = 1/(‐6.44) = ‐0.1544 Women Women 0 0 50 60 70 80 90 100 40 50 60 70 80 90 100 110 120 Rest Pulse (bpm) Weight (pounds) Pulse.xls Schield Pulse.xls Schield V0H V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 17 Schield MLE vs. OLS3-Based Logistic Excel 2013 18 Analysis and Conclusion References & Derivation Model Gender: #1: R 2 = 51% by Height. “Good estimate” Schield, Milo (2014a). www.statlit.org/pdf/ #2: R 2 = 50% by Weight. “Fair estimate” Model-Logistic-MLE1A-Excel2013-Slides.pdf #3: R 2 = 8% by Rest Pulse. “Bad estimate” Model-Logistic-MLE1C-Excel2013-Slides.pdf Conclusion #1: Using OLS(Y|X) for Xo and Schield, Milo (2014b). www.statlit.org/pdf/ using OLS(X|Y) for the associated slope 2014-Schield-Logistic-OLS3-Excel2013-Slides.pdf works fairly well when the overlap is small or Ln(Odds) = a+bX. P/(1-P) = exp(a+bX) moderate: OLS(Y|X). R-squared is high: > 0.5 P = 1 / [1 + exp(-a –bX)]. If P=1/2 at X=Xo, a = -bXo. Conclusion #2: Must use MLE when the overlap P = 1 / {1 + exp[b(Xo – X)]}. b = 4*(dp/dx)|X=Xo. is large: OLS(Y|X) R-squared is low (< 0.5). P = 1 / {1 + exp[4*(Xo-X)*Slope]} But why bother if the model explains so little? If slope = 1/(M2-M1), P = 1/{1+exp[4(Xo-X)/(M2-M1]} 2014-Schield-Logistic-MLE-OLS3-Excel2013-Slides.pdf 3

  4. V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 1 Logistic Regression: MLE vs. OLS3 in Excel 2013 by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project Director, W. M. Keck Statistical Literacy Project Slides and data at: www.StatLit.org/ pdf/2014-Schield-Logistic-MLE-OLS3-Excel2013-slides.pdf Excel/2014-Schield-Logistic-MLE-OLS3-Excel2013.xlsx

  5. V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 2 Background & Goals Doing logistic regression properly requires MLE. Doing MLE in Excel is not easy. See Schield 2014a Schield has identified three OLS shortcuts: OLS1: Model Ln(Odds(p)) where p is near 0 or 1 OLS2: Model Ln(Odds(p)) where p is grouped data OLS3: Use OLS to estimate logistic parameters. These slides compare OLS3 with MLE. Schield (2014b) presents the OLS3-based approach.

  6. V0H Schield MLE vs. OLS3-Based Logistic Excel 2013 3 Predict Gender using Height Source: Minitab Pulse dataset

Recommend


More recommend