(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc
CREDIT CARD FRAUD TRANSACTION APPLICATION CARD NOT FOUND 2
FRAUD DETECTION MODEL PLOT Not Fraud Fraud Application Count False Negatives False Positives 0 0.2 0.4 0.6 0.8 1 Model Score 3
MINIMIZE LOSSES Lost Profitability = (Fraud Cost * FN ) + (Opportunity Cost * FP ) Legend: FN (Fraud missed) FP (Mistaken fraud) 4
CURRENT STATE Vendor Fraud Detection Rules Application Service Model Strategies Customer 5
PROPOSED STATE Vendor Fraud Fraud Detection Detection Rules Application Service Strategies Customer CHAMPION CHALLENGER MODELS 6
MODEL TRAINING Supervised Learning Fraud Classification Training Not Fraud Model Historical Data 7
DATA PATTERNS Filter Transform Impute Features 8
DATA FILTERING Low Cardinality 9
DATA FILTERING High Cardinality 10
DATA FILTERING Medium Cardinality 11
DATA FILTERING Medium Cardinality Predictive Model Training 12
DATA TRANSFORMATION Fraud Status Email jack.smith@gmail.com annie.may@fraudster.com freddy.jr@gmail.com nicole.jack@fraudster.com jon.johnston@gmail.com claudia.penns@us.gov walter.carson@gmail.com ben.benjamin@fraudster.com 13
DATA TRANSFORMATION Domain name Fraud Status gmail.com fraudster.com gmail.com fraudster.com gmail.com us.gov gmail.com fraudster.com 14
DATA IMPUTATION Handling Missing Data Column 2 Column 3 Column 4 Column 1 15
DATA IMPUTATION Handling Missing Data Column 2 Column 3 Column 4 Column 1 16
FEATURE SELECTION IP to Zip Proximity 17
ARCHITECTURE DATA SCIENTIST WORKFLOW Raw Data Transformed Data Trained Model DEVELOPER WORKFLOW Applications Trained Model Score 18
DATA SCIENTIST WORKFLOW Raw Data Transformed Data Trained Model Clean Transform Impute Historical Data Store Binary Repository 19
DEVELOPER WORKFLOW Applications Trained Model Score Vendor Rules Application Model Service Strategies Decisioning & Analytics Platform Model Predictions Store Message Queue Model 1 Model 1 Model 2 Model 3 Binary Repository Shadow Mode 20
DEVELOPER WORKFLOW Applications Trained Model Score Vendor Rules Application Service Strategies Decisioning & Analytics Platform Model Predictions Store Message Queue Champion Model 2 Model 1 Model 3 Model Binary Repository Shadow Mode 21
ARCHITECTURE DATA SCIENTIST WORKFLOW DEVELOPER WORKFLOW Rul Str 22
VALUE STREAM Data Ingestion Model Training Governance Publish Service Publish Model Shadow Mode Governance Evaluation Champion Model 100 75 50 25 23
THANK YOU Sarah LeBlanc Hany Elemary sleblanc@thoughtworks.com helemary@thoughtworks.com @sarah_g_leblanc @hanyelemary Questions? 24
Recommend
More recommend