Contents Case for by-peril modeling By-peril model building Predictive Modeling and By-Peril Analysis for � Data � Peril grouping Homeowners Insurance � Variables � Interactions � By-peril territories � Model validation Conclusion 2
Case for by-peril modeling Case for by-peril modeling Modeling options Root mean error (smaller is better) 1 All peril losses • Regional players True by-peril • Limited number of states with fairly constant peril mix combined Aggregate by- Significant peril rate (M1) difference in 2 • True by-peril modeling Separate by-peril – Increased predictive • Average the true by-peril estimates aggregated into • Practical to do if legacy systems can only implement one accuracy accuracy single rate Aggregate by- set of rates peril rate (M2) All peril losses 3 • Conceptually makes sense combined • Certain variables are predictive for certain perils (e.g. fire True by-peril rating protective devices are predictive for the fire peril) • Responsive to state peril mix differences and changes in those 3 4
By-peril model building: data staging By-peril model building: peril grouping Data options Grouping or peril separation Theft Grouping or breaking of perils -- considerations • Disappearance/theft on premises � Balance volume with recency to reflect an appropriate Availability of detailed/accurate peril codes from claims • Disappearance/theft off premises mix of business Years used • Is the correct cause of loss captured? Wind or hail damage to the roof? Liability • Are there additional peril code break-outs available? • Liability/Medical payments Grouping or breaking � Internal data used for non-catastrophes Data used • Use judgment, intuition, similarity Fire � Simulated data for catastrophes � Human-made fire • Electrical fire � Environmental fire • Grease fire from kitchen � Modeling/testing/validation • Fire from candles � Modeling/validation Data split • Fire from cigarette smoking Water � Out of time data split � • Children playing with matches Weather water � Out of sample data split � • Fireplace fire Non-weather water • Fire caused by electrical appliance Other � Glass � Aircraft � Vehicles 5 6
By-peril model building: data staging By-peril model building: variable selection Practical considerations House characteristics Occupant characteristics Location and / or Amount of insurance Age External variables � Indications for appropriate rate level Number of stories Gender Weather � Systems cost trade off – cost benefit analysis of by-peril Number of rooms Marital status Temperature implementation Square footage Insurance score Precipitation Age of electrical Occupation/retired � Time constraints / speed to market / will dictate some of the options Elevation Age of home Number of occupants � Impact on other departments: claims, systems, pricing/actuarial, Slope Age of plumbing Prior claim activity financial reporting Geography Age of roof Other personal lines Commercial business Roof material Full payment/installments Protection class Construction type Billing lapse Demographics Protective devices Good payer Population Density Which variables are predictive for which peril? Financial variables 7 8
By-peril model building: variable selection By-peril model building: variable selection Univariate analysis Univariate analysis Age � of � Home Amount � of � Insurance � (bucketed) 7 2 10.0% � Predictability by peril 1.8 9.0% 6 � Shape: continuous or categorical 1.6 8.0% 5 1.4 7.0% � Splines 1.2 6.0% Exposure 4 Indicated 1 5.0% � Transformations Indicated +2 � SE 3 0.8 4.0% +2 � SE � Equal buckets 0.6 3.0% � 2 � SE � 2 � SE 2 0.4 2.0% 1 0.2 1.0% 0 0.0% 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 0 20 40 60 80 100 120 Amount � of � Insurance � (bucketed) Age � of � Home 1.8 10.0% 7 1.6 9.0% 8.0% 6 1.4 7.0% 1.2 5 6.0% • Continuous Fits 1 Indicated Exposure 4 5.0% Reduced number of parameters +2 � SE 0.8 Indicated 4.0% 3 � 2 � SE 0.6 Allows for extrapolations outside of range Cubic � Fit 3.0% 2 Piecewise � Continuous � Fit 0.4 Avoids out-of-model smoothing 2.0% 1 0.2 1.0% Complex patterns fitted with piecewise splines 0 0 0.0% 0 20 40 60 80 100 120 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 9 10
Model building: consistency over time Variable selection: occupant characteristics Considerations Urban ��� Rural � trend � over � time Options 1.1 1.05 Age Age 1 0.95 • Named insured Gender Gender 0.9 • Policy level Marital status Marital status 0.85 Rural Insurance score Insurance score • Presence of 0.8 Urban 0.75 Occupation/retired Occupation/retired • Number of 0.7 Number of occupants Number of occupants 0.65 • Maximum/minimum age 0.6 Other personal lines Other personal lines • Composition 2006 2007 2008 2009 Full payment/installments Full payment/installments Exposure � Year Billing lapse Billing lapse Good payer Good payer • Looking for a stable trend over time • Data quality • Correlation with other variables 11 12
Variable selection: occupant characteristics Variable selection: predictiveness by peril Variable\Peril Fire Liability Theft Water Wind Other Considerations By-Peril Territory Insurance Score Age of Home Vendor data Protection class • House characteristics Construction material • Cost • Insurance scoring Amount of Insurance • How often is the data Other lines • Prior claim activity updated by the vendor? Full Pay • Weather Square Feet • How often does the data • Demographics have to be updated? Number of Rooms Claim Free • Elevation • Regulatory support and Retired Flag environment External data links Good Payer Prior Claims Secondary Residence Fire Protective Device Theft Protective Device Number of Occupants � By-peril territory, Insurance Score, Amount of Insurance, Full Pay, Age of Home, and Claims History are consistently powerful across all perils � Territory shows larger spread in weather perils � Insurance Score is predictive in weather perils 13 14
Variable selection: identifying interactions Variable selection: modeling interactions � Looking for situations where the effect of variable x differs depending on variable y Move from categorical*categorical interaction to categorical*continuous interaction � Granularity can be a problem so grouping is often needed before testing for interactions young � old � household household small � 0.6 1.2 household large � 0.9 1.1 household 1.20 2 1.00 0.80 1.5 0.60 Factor 1 0.40 0.20 0.5 high 0.00 mid y 0 low Age � of � Oldest � Occupant low mid high 1/2 � occupants 3+ � occupants x 15 16
Variable selection: modeling interactions By-peril territories Age � of � Occupant 1.8 14.0% Practical considerations 1.6 12.0% 1.4 • Territories are a collection of units (5 digit postal code, 3 digit postal code, counties, 10.0% 1.2 puma, etc) 8.0% 1 1/2 � Occupants 0.8 6.0% 3+ � Occupants • Data at the unit level by-peril is noisy due to limited information in one area 0.6 Exposure 4.0% • Territories are correlated with other rating variables (e.g. amount of insurance, age of home) 0.4 2.0% 0.2 0 0.0% 0 20 40 60 80 100 120 Age � of � Occupant 1.8 14.0% 1.6 Modeling solutions 12.0% 1.4 10.0% • Use territories developed by third parties using industry data 1.2 8.0% 1 1/2 � Occupants • Use residual risk based on initial models that include house information, occupant 0.8 6.0% 3+ � Occupants information, and external weather, demographics, geographical data Exposure 0.6 4.0% 0.4 • Use residual risk based on internal data only 2.0% 0.2 0 0.0% 0 20 40 60 80 100 120 17 18
By-peril territories By-peril territories: smoothing Unsmoothed residual risk Distance based � Nearby units play a bigger role � Farther units play a smaller role Each unit of distance adds the same amount of risk independent of location More appropriate for weather related Adjacency based � Surrounding units play a bigger role � Outer rings of units play a smaller role Clustering smoothed residuals � Maximize variance between clusters � Minimize variance within clusters 19 20
By-peril territories: how much smoothing? By-peril territories: how much smoothing? 21 22
By-peril territories: how much smoothing? Model validation A couple of options… Out of sample validation • Traditional splitting of modeling and validation of the entire dataset may not work • Out of sample validation might fail if the observations are not independent (weather related perils) • The losses coming from the same “event” would be found both in the modeling and in the validation dataset Out of time validation � Could solve the independence issue if one year is kept aside for validation 23 24
Conclusion • Homeowners predictive modeling could be as sophisticated and innovative as auto modeling • By-peril modeling is an important way of achieving increased sophistication and accuracy Contact David R. MacInnis Senior Predictive Modeler Allstate Insurance Company dmaau@allstate.com 25
Recommend
More recommend