Outline Gini Index Gini Index The Ordered Lorenz Curve 2 Frees Frees Summarizing Insurance Scores Insurance Scoring 3 Introduction Introduction Using a Gini Index Effects of Model Selection 4 Under- and Over-Fitting Edward W. (Jed) Frees Non-Ordered Scores Joint work with Glenn Meyers and Dave Cummins Gini Coefficients for Rate Selection University of Wisconsin – Madison and Insurance Services Office 5 Statistical Inference Estimating Gini Coefficients May 25, 2010 Comparing Gini Coefficients 1 / 32 2 / 32 Research Motivation The Lorenz Curve Gini Index Gini Index We consider methods that are variations of well-known tools Would like to consider the degree of separation between Frees Frees in economics, the Lorenz Curve and the Gini Index . insurance losses y and premiums P A Lorenz Curve For typical portfolio of policyholders, the distribution of is a plot of two distributions The Ordered The Ordered Lorenz Curve Lorenz Curve In welfare economics, the vertical axis gives the proportion of premiums tends to be relatively narrow and skewed to the right Insurance Insurance In contrast, losses have a much greater range. income (or wealth), the horizontal gives the proportion of people Scoring Scoring See the example from Wikipedia Losses are predominantly zeros (about 93% for homeowners) Effects of Model Effects of Model Selection Selection and, for y > 0 , are also right-skewed Under- and Under- and Over-Fitting Over-Fitting Difficult to use the squared error loss - mean square error - to Non-Ordered Non-Ordered Scores Scores measure discrepancies between losses and premiums Gini Coefficients for Gini Coefficients for Rate Selection Rate Selection We are proposing several new methods of determining Statistical Statistical premiums (e.g., instrumental variables, copula regression) Inference Inference Estimating Gini Estimating Gini How to compare? Coefficients Coefficients Comparing Gini Comparing Gini No single statistical model that could be used as an “umbrella” Coefficients Coefficients for likelihood comparisons Want a measure that not only looks at statistical significance but also monetary impact 3 / 32 4 / 32
The Gini Index The Ordered Lorenz Curve Gini Index The 45 degree line is known as the “line of equality” Gini Index Frees In welfare economics, this represents the situation where each Frees person has an equal share of income (or wealth) We consider an “ordered” Lorenz curve, that varies from the To read the Lorenz Curve The Ordered The Ordered usual Lorenz curve in two ways Lorenz Curve Lorenz Curve Pick a point on the horizontal axis, say 60% of households Instead of counting people, think of each person as an The corresponding vertical axis is about 40% of income Insurance Insurance Scoring Scoring This represents income inequality insurance policyholder and look at the amount of insurance Effects of Model Effects of Model The farther the Lorenz curve from the line of equality, the greater is the amount premium paid Selection Selection of income inequality Under- and Under- and Order losses and premiums by a third variable that we call a Over-Fitting Over-Fitting The Gini index is defined to be (twice) the area between the relativity Non-Ordered Non-Ordered Scores Scores Lorenz curve and the line of equality Gini Coefficients for Gini Coefficients for Notation Rate Selection Rate Selection Statistical Statistical Let x i be the set of characteristics (explanatory variables) Inference Inference associated with the i th contract Estimating Gini Estimating Gini Coefficients Coefficients Let P ( x i ) be the associated premium Comparing Gini Comparing Gini Coefficients Coefficients Let y i be the loss (often zero) Let R i = R ( x i ) be the corresponding relativity 5 / 32 6 / 32 Example The Ordered Lorenz Curve Gini Index Gini Index Suppose we have only n = 5 policyholders Notation Frees Frees x i - explanatory variables, P ( x i ) - premium, y i - loss, R i = R ( x i ) , Variable i 1 2 3 4 5 Sum I ( · ) - indicator function, and E ( · ) - mathematical expectation Loss 5 5 5 4 6 25 y i The Ordered The Ordered The Ordered Lorenz Curve Lorenz Curve Lorenz Curve Premium P ( x i ) 4 2 6 5 8 25 Vertical axis Relativity R ( x i ) 5 4 3 2 1 Insurance Insurance Scoring Scoring ∑ n F L ( s ) = E [ y I ( R ≤ s )] i = 1 y i I ( R i ≤ s ) Effects of Model Effects of Model = Selection Selection ∑ n Lorenz Ordered Lorenz E y i = 1 y i empirical Under- and Under- and Loss Distn Loss Distn Over-Fitting Over-Fitting 1.0 1.0 Non-Ordered Non-Ordered ● ● that we interpret to be the market share of losses . Scores Scores Gini Coefficients for Gini Coefficients for Horizontal axis 0.8 0.8 ● Rate Selection Rate Selection ● Statistical ∑ n Statistical i = 1 P ( x i ) I ( R i ≤ s ) 0.6 0.6 F P ( s ) = E [ P ( x ) I ( R ≤ s )] ● = Inference Inference ● ∑ n Estimating Gini E P ( x ) i = 1 P ( x i ) Estimating Gini 0.4 0.4 empirical ● Coefficients Coefficients ● Comparing Gini Comparing Gini ● Coefficients Coefficients 0.2 0.2 that we interpret to be the market share of premiums . ● The distributions are unchanged when we 0.0 0.0 ● ● rescale either (or both) losses ( y ) or premiums ( P ( x i ) ) by a 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 positive constant People Distn Premium Distn transform relativities by any (strictly) increasing function 7 / 32 8 / 32
Another Example Insurance Scoring Gini Index Here is a graph of n = 35 , 945 contracts, a 1 in 10 random sample of an Gini Index example that will be introduced later Frees Frees To read the Lorenz Curve Policies are profitable when expected claims are less than Pick a point on the horizontal axis, say 60% of premiums premiums The Ordered The Ordered The corresponding vertical axis is about 50% of losses Lorenz Curve Lorenz Curve Expected claims are unknown but we will consider one or This represents a profitable situation for the insurer Insurance Insurance more candidate insurance scores, S ( x ) , that are The “line of equality” represents a break-even situation Scoring Scoring approximations of the expectation Summary measure: the Gini coefficient is (twice) the area between the line of Effects of Model Effects of Model Selection Selection equality and the Lorenz Curve We are most interested in polices where S ( x i ) < P ( x i ) Under- and Under- and Over-Fitting Over-Fitting It is about 6.1% for this sample, with a standard error of 3.7% Non-Ordered Non-Ordered One measure (that we focus on) is the relative score Scores Scores Loss Distn Gini Coefficients for Gini Coefficients for 1.0 Rate Selection Rate Selection R ( x i ) = S ( x i ) Statistical Statistical P ( x i ) , 0.8 Inference Inference Estimating Gini Estimating Gini Coefficients Coefficients 0.6 Line of Equality Comparing Gini Comparing Gini that we call a relativity . Coefficients Coefficients 0.4 This is not the only possible measure. Might consider Ordered Lorenz Curve R ( x i ) = S ( x i ) − P ( x i ) . 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 9 / 32 10 / 32 Premium Distn Ordered Lorenz Curve Characteristics Ordered Lorenz Curve Characteristics Gini Index Gini Index A Regression Function is a Desirable Score. 3 Additional notation: Define m ( x ) = E ( y | x ) , the regression function. Suppose that S ( x ) = m ( x ) , Frees Frees Recall the distribution functions Then, the ordered Lorenz curve is convex (concave up). This means that it has a positive (non-negative) Gini index. The Ordered F L ( s ) = E [ y I ( R ≤ s )] F P ( s ) = E [ P ( x ) I ( R ≤ s )] The Ordered Lorenz Curve Lorenz Curve and E y E P ( x ) Insurance Insurance Scoring Scoring F L ( ( s ) ) − Losses Effects of Model Effects of Model 1.0 Selection Selection Under- and Under- and Independent Relativities. Relativities that provide no 1 Over-Fitting Over-Fitting 0.8 Non-Ordered Non-Ordered information about the premium or the regression function Scores Scores Gini Coefficients for Gini Coefficients for Assume that { R ( x ) } is independent of { m ( x ) , P ( x ) } . 0.6 Rate Selection Rate Selection Then, F L ( s ) = F P ( s ) = Pr ( R ≤ s ) for all s , resulting in the line of Statistical Statistical Line of Equality Inference Inference 0.4 equality. Estimating Gini Estimating Gini Coefficients Coefficients No Information in the Scores Comparing Gini 2 Comparing Gini 0.2 Coefficients Coefficients Convex Premiums have been determined by the regression function so 0.0 that P ( x ) = m ( x ) . Scoring adds no information: F P ( s ) = F L ( s ) for all s , resulting in 0.0 0.2 0.4 0.6 0.8 1.0 the line of equality. F P ( ( s ) ) − Premiums 11 / 32 12 / 32
Recommend
More recommend