Gini Index Frees Summarizing Insurance Scores Introduction Using a Gini Index Edward W. (Jed) Frees Joint work with Glenn Meyers and Dave Cummins University of Wisconsin – Madison and Insurance Services Office May 25, 2010 1 / 32
Outline Gini Index The Ordered Lorenz Curve 2 Frees Insurance Scoring 3 Introduction Effects of Model Selection 4 Under- and Over-Fitting Non-Ordered Scores Gini Coefficients for Rate Selection Statistical Inference 5 Estimating Gini Coefficients Comparing Gini Coefficients 2 / 32
Research Motivation Gini Index Would like to consider the degree of separation between Frees insurance losses y and premiums P For typical portfolio of policyholders, the distribution of The Ordered Lorenz Curve premiums tends to be relatively narrow and skewed to the right Insurance In contrast, losses have a much greater range. Scoring Losses are predominantly zeros (about 93% for homeowners) Effects of Model Selection and, for y > 0 , are also right-skewed Under- and Over-Fitting Difficult to use the squared error loss - mean square error - to Non-Ordered Scores measure discrepancies between losses and premiums Gini Coefficients for Rate Selection We are proposing several new methods of determining Statistical premiums (e.g., instrumental variables, copula regression) Inference Estimating Gini How to compare? Coefficients Comparing Gini No single statistical model that could be used as an “umbrella” Coefficients for likelihood comparisons Want a measure that not only looks at statistical significance but also monetary impact 3 / 32
The Lorenz Curve Gini Index We consider methods that are variations of well-known tools Frees in economics, the Lorenz Curve and the Gini Index . A Lorenz Curve is a plot of two distributions The Ordered Lorenz Curve In welfare economics, the vertical axis gives the proportion of Insurance income (or wealth), the horizontal gives the proportion of people Scoring See the example from Wikipedia Effects of Model Selection Under- and Over-Fitting Non-Ordered Scores Gini Coefficients for Rate Selection Statistical Inference Estimating Gini Coefficients Comparing Gini Coefficients 4 / 32
The Gini Index Gini Index The 45 degree line is known as the “line of equality” Frees In welfare economics, this represents the situation where each person has an equal share of income (or wealth) To read the Lorenz Curve The Ordered Lorenz Curve Pick a point on the horizontal axis, say 60% of households The corresponding vertical axis is about 40% of income Insurance Scoring This represents income inequality Effects of Model The farther the Lorenz curve from the line of equality, the greater is the amount Selection of income inequality Under- and Over-Fitting The Gini index is defined to be (twice) the area between the Non-Ordered Scores Lorenz curve and the line of equality Gini Coefficients for Rate Selection Statistical Inference Estimating Gini Coefficients Comparing Gini Coefficients 5 / 32
The Ordered Lorenz Curve Gini Index Frees We consider an “ordered” Lorenz curve, that varies from the The Ordered usual Lorenz curve in two ways Lorenz Curve Instead of counting people, think of each person as an Insurance Scoring insurance policyholder and look at the amount of insurance Effects of Model premium paid Selection Under- and Order losses and premiums by a third variable that we call a Over-Fitting relativity Non-Ordered Scores Gini Coefficients for Notation Rate Selection Statistical Let x i be the set of characteristics (explanatory variables) Inference associated with the i th contract Estimating Gini Coefficients Let P ( x i ) be the associated premium Comparing Gini Coefficients Let y i be the loss (often zero) Let R i = R ( x i ) be the corresponding relativity 6 / 32
The Ordered Lorenz Curve Gini Index Notation Frees x i - explanatory variables, P ( x i ) - premium, y i - loss, R i = R ( x i ) , I ( · ) - indicator function, and E ( · ) - mathematical expectation The Ordered The Ordered Lorenz Curve Lorenz Curve Vertical axis Insurance Scoring ∑ n F L ( s ) = E [ y I ( R ≤ s )] i = 1 y i I ( R i ≤ s ) Effects of Model = Selection ∑ n E y i = 1 y i empirical Under- and Over-Fitting Non-Ordered that we interpret to be the market share of losses . Scores Gini Coefficients for Horizontal axis Rate Selection Statistical ∑ n i = 1 P ( x i ) I ( R i ≤ s ) F P ( s ) = E [ P ( x ) I ( R ≤ s )] = Inference ∑ n Estimating Gini E P ( x ) i = 1 P ( x i ) empirical Coefficients Comparing Gini Coefficients that we interpret to be the market share of premiums . The distributions are unchanged when we rescale either (or both) losses ( y ) or premiums ( P ( x i ) ) by a positive constant transform relativities by any (strictly) increasing function 7 / 32
Example Suppose we have only n = 5 policyholders Gini Index Frees Variable 1 2 3 4 5 Sum i Loss 5 5 5 4 6 25 y i The Ordered Lorenz Curve Premium P ( x i ) 4 2 6 5 8 25 Relativity R ( x i ) 5 4 3 2 1 Insurance Scoring Effects of Model Selection Lorenz Ordered Lorenz Under- and Loss Distn Loss Distn Over-Fitting 1.0 1.0 Non-Ordered ● ● Scores Gini Coefficients for 0.8 0.8 ● Rate Selection ● Statistical 0.6 0.6 ● Inference ● Estimating Gini 0.4 0.4 ● Coefficients ● Comparing Gini 0.2 0.2 ● Coefficients ● 0.0 0.0 ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 People Distn Premium Distn 8 / 32
Another Example Gini Index Here is a graph of n = 35 , 945 contracts, a 1 in 10 random sample of an example that will be introduced later Frees To read the Lorenz Curve Pick a point on the horizontal axis, say 60% of premiums The Ordered The corresponding vertical axis is about 50% of losses Lorenz Curve This represents a profitable situation for the insurer Insurance Scoring The “line of equality” represents a break-even situation Effects of Model Summary measure: the Gini coefficient is (twice) the area between the line of Selection equality and the Lorenz Curve Under- and Over-Fitting It is about 6.1% for this sample, with a standard error of 3.7% Non-Ordered Scores Loss Distn Gini Coefficients for 1.0 Rate Selection Statistical 0.8 Inference Estimating Gini Coefficients 0.6 Line of Equality Comparing Gini Coefficients 0.4 Ordered Lorenz Curve 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 9 / 32 Premium Distn
Insurance Scoring Gini Index Frees Policies are profitable when expected claims are less than premiums The Ordered Lorenz Curve Expected claims are unknown but we will consider one or Insurance more candidate insurance scores, S ( x ) , that are Scoring approximations of the expectation Effects of Model Selection We are most interested in polices where S ( x i ) < P ( x i ) Under- and Over-Fitting Non-Ordered One measure (that we focus on) is the relative score Scores Gini Coefficients for Rate Selection R ( x i ) = S ( x i ) Statistical P ( x i ) , Inference Estimating Gini Coefficients Comparing Gini that we call a relativity . Coefficients This is not the only possible measure. Might consider R ( x i ) = S ( x i ) − P ( x i ) . 10 / 32
Ordered Lorenz Curve Characteristics Gini Index Additional notation: Define m ( x ) = E ( y | x ) , the regression function. Frees Recall the distribution functions F L ( s ) = E [ y I ( R ≤ s )] F P ( s ) = E [ P ( x ) I ( R ≤ s )] The Ordered Lorenz Curve and E y E P ( x ) Insurance Scoring Effects of Model Selection Under- and Independent Relativities. Relativities that provide no 1 Over-Fitting Non-Ordered information about the premium or the regression function Scores Gini Coefficients for Assume that { R ( x ) } is independent of { m ( x ) , P ( x ) } . Rate Selection Statistical Then, F L ( s ) = F P ( s ) = Pr ( R ≤ s ) for all s , resulting in the line of Inference equality. Estimating Gini Coefficients No Information in the Scores 2 Comparing Gini Coefficients Premiums have been determined by the regression function so that P ( x ) = m ( x ) . Scoring adds no information: F P ( s ) = F L ( s ) for all s , resulting in the line of equality. 11 / 32
Ordered Lorenz Curve Characteristics Gini Index A Regression Function is a Desirable Score. 3 Frees Suppose that S ( x ) = m ( x ) , Then, the ordered Lorenz curve is convex (concave up). This means that it has a positive (non-negative) Gini index. The Ordered Lorenz Curve Insurance Scoring F L ( s ) − Losses Effects of Model 1.0 Selection Under- and Over-Fitting 0.8 Non-Ordered Scores Gini Coefficients for Rate Selection 0.6 Statistical Line of Equality Inference 0.4 Estimating Gini Coefficients Comparing Gini 0.2 Convex Coefficients 0.0 0.0 0.2 0.4 0.6 0.8 1.0 F P ( s ) ) − Premiums 12 / 32
Ordered Lorenz Curve Characteristics Gini Index Regression Bound 4 Frees Suppose that S ( x ) = m ( x ) , and total premiums equals total claims. Then The Ordered F L ( s ) ≤ sF P ( s ) . Lorenz Curve Insurance Scoring The curve ( F P ( s ) , sF P ( s )) is labeled as a “regression bound.” Effects of Model F L ( s ) − Losses Selection Under- and 1.0 Over-Fitting Non-Ordered Scores 0.8 Gini Coefficients for Rate Selection 0.6 Statistical Inference Line of Equality Estimating Gini 0.4 Coefficients Comparing Gini Coefficients 0.2 Regression Bound Ordered Lorenz Curve 0.0 0.0 0.2 0.4 0.6 0.8 1.0 F P ( s ) ) − Premiums 13 / 32
Recommend
More recommend