ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU Retired Actuary 1
What i is an an act actuar ary? • The m mathematicia icians ns o of the i insurance ce i industry. • A business p professiona nal who wh deals wit with t the f fina inanc ncial im impact o of ris isk a and nd unce ncertaint nty. • Anal alyz yzes, man m anag ages, an and meas asures t the f finan ancial i impac pact o of ri risk an and u uncertai ainty. y. • Develo elops a and validates es m models ls and a commu municates es r result lts t to guide d decis isio ion-making king. • Act ctuaries in in movie ies: Jack k N Nicholson n – Ben S Stille ller – About Sc Schmidt (200 2002) A Alo long Ca Came Polly lly (200 2004) 2
Insu nsuranc nce is is a uniq nique b busine siness ss • We d don’t n’t kno know o our co cost (cl (claims) wh when we we sell t the p policy icy, a and nd with wit s some cl claim ims we we don’t n’t kno know for many ny years. • We We a are not required to to se sell to to everyone – similar ar t to ban ank l loan ans and college a admissi ssions. s. • We do do not char arge the t s sam ame pri price t to everyo ryone. This i is RE REQUIRE RED b by law aw, e e.g., F FL Statute 6 627.062: Ra Rates m may ay not be u unfai airl rly di discri riminatory. ry. ■ A rate is is unf nfair irly dis d iscr criminatory t to a group o of ris isks ks if i the ■ rate te d does n not be bear a a reasonable r relationship to to the the expe xpected l loss e expe xperience am among t the ri risks. 3 $$$$$ $$$
What is da data ta s science? The he interse i secti tion a among math/ th/sta stats, ts, compute c ter s sci, & subject ct m matter knowledge k to t extract ct m meaning ningful insig i ights from data ta transl tr slating into i to tangible ta business bu value. v 4
Exam amples of of data sci cience ce e • Interne net s search ch e engine ine a algorit ithms. • Tar argeted adv advert rtising an and recommenda dations. • Tar arget S Stores s sent di diape aper c coupo pons t to the pr pregnan ant teenager t be before she she to told he her f fath ther. (Fo Folklore?) • Moneyb yball an and spo port rts an anal alyt ytics. • Bette tter singles s m matchi hing o on dating w websi bsite tes. s. • Diseas ase di diag agnosis, pe pers rsonal alized h heal althcar are r recs. • Data driven dri c cri rime pr predi diction, f fac acial al r recognition, t terr rrori rist forecasts. sts. • Which t twe weets d did id Trump writ wr ite, a and nd wh whic ich did d id his is s staff wr writ ite? 5
Act Actuar aries an and data sci cience ce • “Act ctuaries we were a among ng t the f fir irst d data sci cient ntists.” (C (Colin P Prie iest, ac actuary ary turn rned da data scientist a at Data Ro Robot, S Singapo pore.) • Actuari aries ar are strongest a at math/stat an and do domai ain k knowledg dge (we s study i insurance, b besi sides m math/ h/sta stat). t). • Data scient s ntis ists a are strong ngest a at computer s science nce, e especia cially co coding ing, d data m manip nipulation a and nd joinin ining t tables, t theory o of machine ine l learning ning ( (training ing v versus t testing ing, o overtraining ining), a and machine ine l learning ning a algorit ithms. • Actuari arial exam xams now n i include de: Gener erali lized ed linea ear m models ls, K K-nearest st n neighbo bors, s, K K-mea means clusteri ring, Baye B ayes c clas assifier, r, de decision t trees, r ran ando dom f forest, pri principa pal c compo ponent an anal alys ysis. Also a a pr predi dictive an anal al spe pecialty. y. 6
Rando domly s spl plit da data ta into to tr training versus us te testi ting da data ta icted) 2 / n] RMSE o on n test data d = = √ [ ∑ (A (Act ctual – Predict n] 7
Som ome act actuar arial e exam amples of of data s sci cience ce t tech chniques • If predict ctiv ive m modeling ing refers r to t estim imating ing i insurance nce c costs, t then actuari ac aries h hav ave b been do doing this t forever. f • Today predict p ctiv ive m modeling ng i is computatio iona nally i intens nsiv ive, o often testi sting a all possibl ble p permuta tati tions of o variabl bles, s, t transformati tions, s, etc • The 2 2 broad ad c categori ries i in da data science ar are pr predi diction and an classifica ication. Classif ifica icatio ion i is p predict cting ng a a category. • Pr Prediction on oft often i invol volve ves t types of of regression. . Linear regression r i is bei eing r rep eplaced b by mo more flex f lexible Ge Generalized L Linear M Models els. • Classif ifica icatio ion includes: i Decis cisio ion t trees: underwr writ iting ing Clustering ing: territ itorie ies Pri rincipa pal c compo ponent an anal alys ysis: de detect f frau aud • In the f following e exam xamples, assume as n n inde dependent v vari ariab ables an and p data da val v alues. 8
For or i insuran ance ce rating, we grou oup (h (hop opefully) similar cu custom omers into cl clas asses an and ch char arge an an av averag age rate for or the cl clas ass. Clas assifica cation is rar arely perfect p ct. Before classif ifica icatio ion After classif ifica icatio ion 9
Insu nsuranc nce cla lasses sses may inc inclu lude age, e, gend ender, urban / / rur ural te territory, y, marita tal sta tatus tus, miles dr driven, claims histor ory, car car type, car car ag age, etc. e c. But w within in e each ch n n-dim imensiona nal s slic ice, t there is is s stil ill co cons nsiderable vari ariab ability. y. A compan pany w wan ants t to choose the t b better t than an av averag age c customers within eac e ach c clas ass t to mak ake a a pr profit. 10
Gener eneralized L Linea inear M Models: els: pric icin ing • Tradit itionally we we u used cl classical l line inear r regression, n, a and nd we we t treated our pricing ng b by class a as multip iplica icativ ive: • Ba Base se rate te = = $ $100 Times fa factor or for f or Age i i = 1.5 .50 Times fa factor or f for or Gender j j = 1.2 .20 Times fa factor or f for or Territor ory k k = 1.4 .40, ... ... , etc. • This di disregar ards ds i interac actions b between c clas asses an and mak akes as assumptions o on norm rmal ality an and common v vari arian ances. • GLMs co cons nsis ist o of wid wider r rang nge o of models wit with r response varia v iable as assumed t to be a a member o of expo xponential f fam amily. y. • Re Results i in some f fac actors rs b being r redu duced, o others i increas ased. d. • Other appl applications o of GLM LM: Effect ct o of telematics ics o on claim ims Unde derw rwri riting s score c car ards ds Predict ict c claims l like kely t to settle f far above t their ir i initia ial e estim imate 11
Dec ecisio ision t trees: ees: und nder erwritin ing • Sequ quen entially lly s splits d data into c categ egories es h having s simi mila lar v values es for f de depe pendent v vari ariab ables. • Use Uses statistic sta such su as a Gini In Index to to do sp split. • Possible v vari ariab ables: no. ye years ars renewed, r occupa o pation, pr premium paym payment h history, ry, telematics t ( (spe peed, b brak aking, time t of o day, day, etc.) 12
Clus uste tering: te territories • Parti titions d data ta into to c classes ba base sed o on ho how close c sely d data ta i is grouped ed. Iter eratively ely u updates es c center ers and a re-parti titi tions. s. • There i is n no de depe pende dent vari v ariab able. • Ano nother u use is is clustering cl sim s imilar o occu ccupations ns. • Flori rida da has h as 2 28 r rating terr rritori ries in i au auto. 13 Yao, J. (2008). Clustering in ratemaking; applications in territories clustering. Casualty Actuarial Society Predictive Modeling Seminar
Recommend
More recommend