Joseph O. Marker Marker Actuarial Services, LLC a e ctua a Se v ces, C and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1
Expected vs Actual Distribution Test distributions of: Number of claims (frequency) Size of ultimate loss (severity) Size of ultimate loss (severity) Sources of significant difference between actual and expected amounts: Programming or communication errors Not understanding how statistical language Not understanding how statistical language (e.g. “R”) works. Errors or misleading results in “R”. J. Marker, LSMWP, CLRS 2
Display Raw Simulator Output Cl i Claims file fil Simulation Occurrence Claim Accident No No No Date Report Date Line Type 1 1 1 1 1 1 20000104 20000104 20000227 20000227 1 1 1 1 1 2 1 20000105 20000818 1 1 ………. Transactions file Transactions file Simulation Occurrence Claim Trans ‐ Case No No No Date action Reserve Payment 1 1 1 1 1 1 20000227 REP 20000227 REP 2000 2000 0 0 1 1 1 20000413 RES 89412 0 1 1 1 20000417 CLS ‐ 91412 141531 …….. ………. …….. ……… J. Marker, LSMWP, CLRS 3
Another use for Testing information Create Ultimate Loss File for Analysis – Layout Create Ultimate Loss File for Analysis Layout Simula Occur ‐ Claim Accident. Report. Case. Pay ‐ ‐ tion. rence Line Type No Date Date Reserve ment No No Idea: Another use for this section of paper If an insurer can summarize its own claim data to this format, then it can use the tests we will discuss to parameterize the Simulator using its data. We have included in this paper all the “R” code used in testing. J. Marker, LSMWP, CLRS 4
Emphasis in the Paper Document the “R” code used in performing various tests. Provide references for those who want to explore the modeling more deeply. ode g o e deep y Provide visual as well as formal tests QQPlots, histograms, densities, etc. J. Marker, LSMWP, CLRS 5
Test 1 – Frequency, Zero ‐ Modification, Trend Model parameters: M d l # Occurrences ~ Poisson (mean = 120 per year) 1,000 simulations One claim per occurrence O l i Frequency Trend 2% per year, three accident years Pr[Claim is Type 1] = 75%; Pr[Type 2] = 25% Pr[CNP(“Closed No payment”)] = 40% Pr[CNP( Closed No payment )] = 40% “Type” and “Status” independent. Status is a category variable for whether a claim is closed with payment. Test output to see if its distribution is consistent with assumptions. p J. Marker, LSMWP, CLRS 6
Test 1 – Classical Chi ‐ square C Contingency Table ti T bl Actual Counts Expected Counts Type 1 T 1 Type 2 T 2 M Margin i T Type 1 1 Type 2 T 2 M Margin i 111,066 37,007 0.398906 111,029.0 37,044.0 0.398906 CNP CNP 167,268 55,857 0.601094 167,305.0 55,820.0 0.601094 CWP CWP 0.749826 0.250174 371,198 0.749826 0.250174 371,198 Margin Margin 2 ( Actual Expected ) Χ 2 = = 0.0819 ij ij Expected i j ij Pr [ Χ 2 > 0.0819 ] = 0.775. The independence of Type and Status is supported. J. Marker, LSMWP, CLRS 7
Test 1 – Regression approach Previous result can be obtained using xt abs xt abs command in “R” Result can also be obtained using Poisson GLM Full model: ll d l m m odel 6x<- odel 6x<- gl m l m ( count ~ Type ( count ~ Type + St at us + Type* St at us, + St at us + Type* St at us, dat a = t em p. dat acc. st ack, f am i l y = poi sson, x=T) Reduced model: d d d l m m odel 5x<- odel 5x<- gl m l m ( count ~ Type + St at us , ( count ~ Type + St at us , dat a = t em p. dat acc. st ack, f am i l y = poi sson, x=T) Independence obtains if the interactive variable Type*Status is not significant . J. Marker, LSMWP, CLRS 8
Test 1 – Analysis of variance anova( anova ( ( m ( m o odel 5 d l 5 d l 5x, x, m m o odel 6 d l 6 d l 6x, x, t t es t est =" Chi " ) t " Chi " ) " Chi " ) " Chi " ) Anal ysi s of Devi ance Tabl e Response: count Ter m s Resi d. Df Resi d. Dev Test Df 1 + Type + St at us 143997 160969. 366 2 Type + St at us + Type * St at us 143996 160969. 284 +Type: St at us 1 Devi ance Pr ( Chi ) 1 2 0. 0819088429 0. 774727081 Result matches the previous Χ 2 Test Result matches the previous Χ 2 Test. We did not show here the model coefficients, which will produce the expected frequency for each combination of Type and Status. d f f h bi i f T d S J. Marker, LSMWP, CLRS 9
Test 2 – Univariate size of loss Model parameters: M d l Three lines – no correlation in frequency by line # Claims for each line ~ Poisson (mean = 600 per year) Two accident years, 100 simulations Size of loss distributions Line 1 – lognormal Line 2 – Pareto Line 3 ‐‐ Weibull Zero trend in frequency and size of loss. Expected count = 600 (freq) x 100 (# sims) x 3 (lines) x 2 (years) = 360,000. Actual # claims: 359 819 Actual # claims: 359,819. J. Marker, LSMWP, CLRS 10
Size of loss – testing strategy Person doing testing Person running simulation. Test all three distributions on each line’s output. T t ll th di t ib ti h li ’ t t Produce plots to “get a feel” for distributions. Fit using maximum likelihood estimation. Produce QQ (quantile ‐ quantile) plots Run formal goodness ‐ of ‐ fit tests. J. Marker, LSMWP, CLRS 11
Si Size of loss – Histograms and p.d.f. f l Hi t d d f J. Marker, LSMWP, CLRS 12
Size of loss – Histograms and p.d.f. J. Marker, LSMWP, CLRS 13
Size of loss The plots above compare: Histogram of empirical distribution Density of the theoretical distribution with m l e Density of the theoretical distribution with m.l.e. parameters The plots show that both Weibull and Pareto fit Lines 2 and 3 well. QQ plots offer another perspective. J. Marker, LSMWP, CLRS 14
Size of loss – QQ Plots Example of “R” code to produce a QQ Plot t hqua. w2 <- q r wei bul l ( n2, shape=f i t . w2$est i m at e[ 1] , scal e=f i t . w2$est i m at e[ 2] ) generate a random sample same size n2 as empirical data qqpl ot ( ul t l oss2, t hqua. w2, xl ab=" Sam pl e Q uant i l es" , yl ab= Theor et i cal Q yl ab=" Theor et i cal Q uant i l es" uant i l es , m m ai n=" Li ne 2 W ai n= Li ne 2, W ei bul l " ) ei bul l ) ultloss2 is empirical data, thqua.w2 is the generated sample abl i ne( 0, 1, col =" r ed“ ) One can also replace the sample with the quantiles of the theoretical Weibull c.d.f. J. Marker, LSMWP, CLRS 15
Size of Loss – QQ Plot, Line 1 J. Marker, LSMWP, CLRS 16
Size of Loss – QQ Plot, Line 2 J. Marker, LSMWP, CLRS 17
Size of Loss – QQ Plot, Line 3 . J. Marker, LSMWP, CLRS 18
Size of Loss – Fitted distributions From QQ Plots, it appears that lognormal fits Line 1, Pareto fits Line 2, and Weibull fits Line 3. Chi ‐ square is a formal goodness ‐ of ‐ fit test. Section 6 discusses setting up the test for Pareto on Line 2. Appendix B contains g p pp “R” code for all the chi ‐ square tests. Komogorov ‐ Smirnov test was applied also, but too late to include results in this presentation. J. Marker, LSMWP, CLRS 19
Size of Loss – Chi ‐ square g.o.f. test Setting up bins and the expected and actual # claims by bin is not easy in R. Define break points and bins: s = sqr t ( var ( ul t l oss2) ) s = sqr t ( var ( ul t l oss2) ) ul t 2. cut <- ul t 2. cut <- cut ( ul t l oss2. cut ( ul t l oss2. 0, 0, ##bi nni ng dat a ##bi nni ng dat a br eaks = c( 0, m br eaks = c( 0, m - s/ 2 - s/ 2, m , m , m , m +s/ 4, m +s/ 4, m +s/ 2, m +s/ 2, m +s, m +s, m +2* s, 2* m + 2* s, 2* m ax( ul t l oss2) ) ) ax( ul t l oss2) ) ) Not e: ul t l oss2. 0 i s vect or of l oss si zes, m = m ean The t abl e of expect ed and obser ved val ues by bi n: # E. 2 O . 2 x. sq. 2 #[ 1, ] 43993. 890 44087 0. 19705959 Not es: #[ 2, ] 35651. 989 35680 0. 02200752 E. 2 expect ed num #[ 2, ] 35651. 989 35680 0. 02200752 E. 2 expect ed num ber ber #[ 3, ] 10493. 758 10323 2. 77864169 O . 2 act ual num ber #[ 4, ] 7240. 583 7269 0. 11152721 x. sq. 2 Chi - sq st at i st i c #[ 5, ] 9277. 383 9164 1. 38570182 #[ 6 ] 8063 576 8176 1 56743997 #[ 6, ] 8063. 576 8176 1. 56743997 #[ 7, ] 5289. 820 5312 0. 09299630 J. Marker, LSMWP, CLRS 20
Recommend
More recommend