presenter
play

Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a - PowerPoint PPT Presentation

Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to


  1. Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since 2015. Prior to teaching, he was Senior Director, Advanced Analytics at Canadian Imperial Bank of Commerce where he focused on solving all manners of marketing analytics problems related to customer relationship management since 1996. He worked for American Express Canada in Risk Management before CIBC. Daymond received his M.Sc. degree in Operations Research and B.Sc. in Physics Honours from the University of British Columbia. He started using SAS in 1980 and has continued ever since. 1

  2. 1489-2017 Timing is Everything Detecting Important Behaviour Triggers

  3. Predictive Analytics If the future looks like the past, then past patterns can be used to predict the future Past Now Future His istorical W l Win indow Predicti tion W Window Historical information up to current Target definition of period used to find pattern that desired outcome strongly correlates with target within some timeframe definition But L Life i is Dynamic. What t if the p e process i is non-stati tionary a and h has c changed ed? 3

  4. Think Outside The Box Compare d e differ eren ent p people a e at t the s same t e time: e: Follow t w the s e same p e person in time: e: 1. Who i is likely t to b buy 1. Who c changed? 1. 1. 2. Wh Who h has mo more mo money ey 2. What w was the c change? 2. 2. Change y your p per erspective. Ask d different t ques esti tions to g get t new i w insights ts. 4

  5. Change Point Analysis Shift in Mean

  6. Change Point Analysis Change Point Analysis is the problem of Econ onom omy: estimating the point at which some  Gros oss D Dom omestic P ic Prod oduct ct Hundred eds statistical property changes.  Stock M Market I Inde ndex This presentation will focus on change in the me mean, i.e., the average has shifted. Cor orpor orate P Performance ce Metr trics ics:  Number o Nu of clien ents Hundred eds  Portfolio lio ba balance Customer er D Det etails:  EFT FT pa payroll l of a a che hequi quing account Tens of of Milli illions  Your monthly hly credit it c card s d spe pend nd 6

  7. Did the Mean shift? Naïve r e rule: e: mea ean(p (per eriod 2) d differ eren ent f t from mean(per eriod 1 1) b by 2 20% The hese g graph phs m meet 20+% cha hange, Na Naïve r rule a also g gen enerate e Statistic ical t l tests of m mean but t they ey are e False P e Positives es: False e Negatives es. di difference, e.g., t two sample ple t-test, i involve r ratio o of m mean n differenc nce t to s standa dard de deviation, i it is no not ba based d on n the m mean d n differenc nce only. The he issue ue with t h the he na naïve rul ule is that it it does not t take data varia iabilit ity into a account. The he de decisio ion rul ule m mus ust be be modif ified t d to take v variabil ilit ity into c conside deration. n. 7

  8. Detect Mean Shift via CUSUM Range CUSUM: cumulative sum of deviations of centered series  If deviations are random, they tend to cancel out resulting in small CUSUM range  Shifted sections have deviations of the same sign, CUSUM will move away from zero Dec ecision rule: Large C CUSUM range i e is indicative o e of s shifted ed m mea ean 8

  9. How Large is Large? Leverage variability of empirical data to determine “Large”: Calculate Empirical CUSUM  Calculate CUSUM distribution by randomly shuffling the data many times   P-value of Empirical CUSUM is significant is proportion of Distribution >= Empirical By using actual data variability to perform significance test, False Positive and False Negative can be minimized Natural data variation d deter etermine w e whether er e empirical p patter ern i is unusual 9

  10. When Did It Happen? Two estimators for location of change: Change occurred at point Change occurred at 2. 1. of Mi Minimu mum V m Variance Max x Ab Abso solute CU CUSUM  More complex calculation  Simple to compute  More precise  A little less precise Recursiv ively ly split lit a a tim ime s series in into m man any s sections 10

  11. Reaction Speed to New Change V1 V1 V2 V2 V3 V3 V4 V4 V5 V5 V6 V6 V7 V7 V8 V8 V9 V9 V1 V10 V1 V11 V1 V12 Pr Prob ob 10.53 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 0.15 12.15 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 0.39 11.64 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 0. 0.77 77 8.89 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 0. 0.93 93 13.08 7.53 9.72 12.08 10.13 12.45 9.70 10.58 20 20 20 20 0.99 0. 99  The first series has no change. Probability of mean shift = 0.15, insignificant.  When shifted by one and appending a single large value, probability increases to 0.39, still insignificant. CPA is robust to single spike.  Shifting the series by two and append two large values, probability increases to 0.77. CPA is signaling heightened likelihood of mean shift, odds are 3:1.  Shifting by three and appending three large values raises the probability to 0.93. With four consecutive value, probability is 0.99. For s short t time s e series es, a a section o of t three o ee or m more e shifted ed v value h e has P >= 0.90 11

  12. Decision Logic St Statis istical l Busi usiness ss Events o of Signif Si ific icance Si Signif ific icance Interest st  No False Positive  Magnitude of  True events for change interesting investigation and  No False Negative to business intervention 12

  13. Pay Increase

  14. Large Scale Computation En End-to to-end p process b built i in SAS: Numer erically i inte tensive e computation:  Handles es v varying l g length t time s e series es 1. Hundreds of 1. of m milli illions of of tim ime s serie ies  28 code 2. 2. Random s shuffl fle s e significance te e test e modules es including d g data p prep  2,000 lin 3. Minimizati 3. tion of R Res esidual S Sum o of S f Squares lines of of c cod ode  In In-mem emory p processing f g for eff efficiency (in in-me memor ory a y array d y dat ata s structures) 14

  15. EFT Payroll Increase Proce ocess 155 M Million ion p payroll oll e events Each month de h detect 1 1,200 – 1, 1,500 500 c clients Can de n detect large pa pay s spi pike, e e.g., l lum ump s p sum um pa payments Step Records CPU Time Elapsed 1. Extract four years of EFT payroll 155 Million records 18 minutes 25 minutes 2. Aggregation (customers with multiple acct) 145 Million records 2 minutes 2 minutes 3. Eliminate low pay and closed accounts 85 Million records 5 minutes 5 minutes 4. Pay frequency determination 62 Million records 2 minutes 2 minutes 5. Remove irregular off cycle pay 60 Million records 5 minutes 5 minutes 6. Kalman Filter smoothing of pay spikes 60 Million records 4 minutes 4 minutes 7. Change Point Detection 60 Million records 7 hours 7 hours 8. Event Selection 17K Accounts 1 minute 1 minute 15

  16. Payroll increase examples (clean) 16

  17. Payroll increase examples (noisy) 17

  18. More money in pocket means… We ide dentify i in n one ne year 2 21K c cus ustomers tha hat g grow f funds unds b by $300 Millio ion and nd inc ncrease c card s d spe pend nd by $50 Millio ion 1. 1. Payroll ll inc ncrease more often f for young nger pe people le, a , and t nd the hey ope pen m more a accoun unts 2. 2. Youn unger cus ustomers i invest more, s spe pend m nd more, bo borrow m more (ne new c car, , bi bigger ho hous use) 3. 3. Olde der pe people ple j jus ust save t the heir ir a addi dditio iona nal i l inc ncome, t the hey do don’ n’t spe pend nd more or bo borrow more After Pay I Increa ease se Per C Customer er F Fund unds s Increa ease se Total i in $ $Million Account Card rd Card rd Age ge Cust stomer Asse sset Lendi ding ng Funds Increa ease se Spend Spend 19 - 35 19 35 7, 7,80 800 6. 6.6% 6% $5,900 900 $10, $10,800 $3,200 200 $130 $130 $25 $25 35 35 - 45 45 5, 5,70 700 4. 4.3% 3% $5,500 500 $5,400 400 $2,400 400 $62 $62 $15 $15 45 - 55 45 55 5,00 5, 000 3. 3.2% 2% $9,600 600 $3,500 500 $3,100 100 $66 $66 $15 $15 55 55 - 65 65 2, 2,50 500 2.2% 2. 2% $16,100 $16, $600 $600 -$1,900 900 $42 $42 -$5 $5 All Al 21,000 000 4. 4.3% 3% $7,900 900 $6,400 400 $2,300 300 $300 $300 $50 $50 18

  19. Credit Card Spend

  20. Credit Card Spend Decrease Scenario: Causes:  Portfolio of 4 million+ credit cards  Reduced acquisition?  Annual spend volume low to plan  Increased attrition? by approximately $700 million  Slow down in economy?  Increased competition in Reward cards? 20

Recommend


More recommend