an introduction to the analysis of rare events
play

An Introduction to the Analysis of Rare Events Nate Derby Stakana - PowerPoint PPT Presentation

Linear Regression Poisson Regression Beyond Poisson Regression An Introduction to the Analysis of Rare Events Nate Derby Stakana Analytics Seattle, WA SUCCESS 3/12/15 Nate Derby An Introduction to the Analysis of Rare Events 1 / 43


  1. Linear Regression Poisson Regression Beyond Poisson Regression An Introduction to the Analysis of Rare Events Nate Derby Stakana Analytics Seattle, WA SUCCESS 3/12/15 Nate Derby An Introduction to the Analysis of Rare Events 1 / 43

  2. Linear Regression Poisson Regression Beyond Poisson Regression Outline I Linear Regression 1 Statistical Modeling with Linear Regression Linear Regression with Rare Events Poisson Regression 2 Fitting the Model Interpreting the Results Getting Predicted Counts Beyond Poisson Regression 3 Nate Derby An Introduction to the Analysis of Rare Events 2 / 43

  3. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Statistical Modeling with Linear Regression Suppose we have a data set of two variables: X i and Y i Use X i to estimate Y i . We’ll know X i but not Y i . Look at driver population percent vs. annual fuel consumption: Generate scatterplot SYMBOL1 COLOR=blue ...; PROC GPLOT DATA=home.fuel; PLOT fuel*dlic=1 / ...; RUN; Nate Derby An Introduction to the Analysis of Rare Events 3 / 43

  4. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Fuel Consumption vs Driver Population Percentage Scatterplot 90 Annual Fuel Consumption per Person (x 1000 gallons) 70 50 30 70% 80% 90% 100% 110% Driver Population Percentage Nate Derby An Introduction to the Analysis of Rare Events 4 / 43

  5. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Statistical Modeling with Linear Regression Statistical model = we fit a trend line to the data. Fit a line that best described the general trend. Linear Regression Model: Y i = β 0 + β 1 X i + ε i � �� � ���� error term linear trend Fit a model: Y i = � � β 0 + � β 1 X i Estimating (unknown) Y i from (known) X i Nate Derby An Introduction to the Analysis of Rare Events 5 / 43

  6. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Graphing a Linear Regression Line Quickly fit and graph a linear regression line: Generate linear regression line SYMBOL1 COLOR=blue ...; SYMBOL2 LINE=1 COLOR=red INTERPOL=rl ...; PROC GPLOT DATA=home.fuel; ... PLOT fuel*dlic=1; fuel*dlic=2 / ... OVERLAY; RUN; NOTE: Regression equation : fuel = 9.617975 + 57.20502*dlic. � FUEL i = 9 . 617975 + 57 . 20502 · DLIC i . Nate Derby An Introduction to the Analysis of Rare Events 6 / 43

  7. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Fuel Consumption vs Driver Population Percentage Linear Regression Line 90 Annual Fuel Consumption per Person (x 1000 gallons) 70 50 30 70% 80% 90% 100% 110% Driver Population Percentage Nate Derby An Introduction to the Analysis of Rare Events 7 / 43

  8. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Adding Prediction Intervals Let’s add 95% prediction intervals: Adding prediction intervals SYMBOL1 COLOR=blue ...; SYMBOL3 LINE=1 COLOR=red INTERPOL=rlcli ...; PROC GPLOT DATA=home.fuel; ... PLOT fuel*dlic=1; fuel*dlic=3 / ... OVERLAY; RUN; 95% of data points should be within these intervals. Should hold for future data points! Nate Derby An Introduction to the Analysis of Rare Events 8 / 43

  9. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Fuel Consumption vs Driver Population Percentage Linear Regression Line + 95% Prediction Bounds 90 Annual Fuel Consumption per Person (x 1000 gallons) 70 50 30 70% 80% 90% 100% 110% Driver Population Percentage Nate Derby An Introduction to the Analysis of Rare Events 9 / 43

  10. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Not Just for Straight Lines Y i = β 0 + β 1 X i + β 2 X 2 Quadratic trend: + ε i i Y i = β 0 + β 1 X i + β 2 X 2 i + β 3 X 3 Cubic trend: i + ε i Quadratic/cubic trends SYMBOL1 COLOR=blue ...; SYMBOL4 LINE=1 COLOR=red INTERPOL=rqcli ...; SYMBOL5 LINE=1 COLOR=red INTERPOL=rccli ...; PROC GPLOT DATA=home.fuel; PLOT fuel*dlic=1; fuel*dlic=4 / ... OVERLAY; RUN; PROC GPLOT DATA=home.fuel; PLOT fuel*dlic=1; fuel*dlic=5 / ... OVERLAY; RUN; Nate Derby An Introduction to the Analysis of Rare Events 10 / 43

  11. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Fuel Consumption vs Driver Population Percentage Quadratic Regression Line + 95% Prediction Bounds 90 Annual Fuel Consumption per Person (x 1000 gallons) 70 50 30 70% 80% 90% 100% 110% Driver Population Percentage Nate Derby An Introduction to the Analysis of Rare Events 11 / 43

  12. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Fuel Consumption vs Driver Population Percentage Cubic Regression Line + 95% Prediction Bounds 90 Annual Fuel Consumption per Person (x 1000 gallons) 70 50 30 70% 80% 90% 100% 110% Driver Population Percentage Nate Derby An Introduction to the Analysis of Rare Events 12 / 43

  13. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Linear Regression with Rare Events Rare event: No rule of thumb, but Any disease is considered a rare event. Any event as frequent as a disease can be considered rare. Depends on time unit: Earthquakes in the past ten years = rare. Earthquakes in the past million years = not so rare. Our rule of thumb: Rare if number of events in a time period are in single digits Nate Derby An Introduction to the Analysis of Rare Events 13 / 43

  14. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Exploratory Analysis Find a relationship between rare event Y i and some variable X i : X i may or may not be rare. Example: X i / Y i = # worker’s compensation claims per firm one year before/after an inspection at Oregon OSHA. Let’s look at a scatterplot: Generate scatterplot SYMBOL1 COLOR=blue ...; PROC GPLOT DATA=home.claims; PLOT post_claims*pre_claims=1 / ...; RUN; Nate Derby An Introduction to the Analysis of Rare Events 14 / 43

  15. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Pre- vs Post-Inspection Claims Scatterplot 18 16 14 Post-Inspection Claims 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Pre-Inspection Claims Nate Derby An Introduction to the Analysis of Rare Events 15 / 43

  16. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Scatterplot Not Useful Data points stacked on top of each other! We have 1293 data points, can only see 49. Let’s look at a bubble plot: Generate scatterplot PROC FREQ DATA=home.claims NOPRINT; TABLES post_claims*pre_claims / out=stats1 ( KEEP=post_claims pre_claims count ); RUN; PROC GPLOT DATA=stats1; BUBBLE post_claims*pre_claims=count / ... BSIZE=10 ; RUN; BSIZE= determines bubble sizes. Nate Derby An Introduction to the Analysis of Rare Events 16 / 43

  17. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Pre- vs Post-Inspection Claims Bubble Plot 18 16 14 Post-Inspection Claims 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Pre-Inspection Claims Nate Derby An Introduction to the Analysis of Rare Events 17 / 43

  18. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Bubble Plot Not That Useful Can be difficult to interpret! Box plot is better. PROC BOXPLOT OK, but not consistent with our axes. Let’s look at a box plot with PROC GPLOT : Generate box plot with PROC GPLOT SYMBOL6 COLOR=blue INTERPOL=boxt00 ...; SYMBOL7 COLOR=red VALUE=diamondfilled ...; PROC GPLOT DATA=home.claims; PLOT post_claims*pre_claims=6 m_post_claims*pre_claims=7 / HAXIS=axis3 VAXIS=axis4 OVERLAY ...; RUN; INTERPOL=boxt00 : tops/bottoms on whiskers showing minima/maxima. Nate Derby An Introduction to the Analysis of Rare Events 18 / 43

  19. Linear Regression Statistical Modeling with Linear Regression Poisson Regression Linear Regression with Rare Events Beyond Poisson Regression Add a Histogram Good to also show distribution of X = pre-inspection claims: Want to use PROC GPLOT for consistency with our axes. Generate histogram with PROC GPLOT SYMBOL9 COLOR=blue INTERPOL=boxf00 CV=blue ...; PROC GPLOT DATA=stats2; PLOT count*pre_claims=6 / HAXIS=axis3 ...; RUN; INTERPOL=boxf00 : tops/bottoms on whiskers showing minima/maxima, but filled with CV color. Nate Derby An Introduction to the Analysis of Rare Events 19 / 43

Recommend


More recommend