Predicting Real-Time Transaction Fraud Sami Niemi, PhD Barclays, Quantitative Analytics, Fraud Detection #StrataData - Predicting real-time transaction fraud using supervised learning
Contents Background 1 Raw Data 2 Data Processing 3 4 Development 4 Validation 5 Implementation 6 Summary 7 #StrataData - Predicting real-time transaction fraud using supervised learning 2
Background – Definitions and Examples an individual, or group of people, create or 3 rd Party use a third-party's identity in order to apply Fraud for products or take over an account without the consent or knowledge of the third-party. Card Card Present (CP) • e.g. lost, stolen, counterfeit/clone Transaction Card not Present (CnP) Fraud • e.g. identity theft, hacking, fake online shops #StrataData - Predicting real-time transaction fraud using supervised learning 3
Background – Motivation (global view) #StrataData - Predicting real-time transaction fraud using supervised learning 4
Background – Motivation (UK view) Source: Fraud the Facts 2018 by UK Finance #StrataData - Predicting real-time transaction fraud using supervised learning 5
Background – Challenges Fraudsters Adapt and Invent New MOs Real-Time Runtime Requirements Front Page News Material #StrataData - Predicting real-time transaction fraud using supervised learning 6
Aim of the project was to develop and implement new Debit CP and CnP real-time fraud detection models, which can reduce fraud losses and protect genuine customers. #StrataData - Predicting real-time transaction fraud using supervised learning 7
Raw Data – Sources Non-Mon Events Other Confirmed Cards and Frauds Accounts Debit Card Transactions Payment Customer Instrument Info Info Account Info #StrataData - Predicting real-time transaction fraud using supervised learning 8
Data Processing – Quality Assurance and Data Exploration • Reconciliation, Volumes, and Amounts • Daily and Monthly Summary Statistics Data Quality • Anomaly and Outlier Detection • Trend Analysis and Anomaly Detection • Distributions (PDFs and bar charts for Fraud / Non) Exploration • Correlations (covariance, correlation w/ target, etc.) • Thresholding • Issue Generation and Resolution Report • Documentation and Governance #StrataData - Predicting real-time transaction fraud using supervised learning 9
Data – High Level Statistics • Total: 220 – 300M debit card transactions with total Volumes value of £9 – 11B per month • CP: 110M contactless, 20M ATM • CnP: 85M e-commerce + telephony • 10M unique customers per month Customers • transacting in 220 countries • using 12M debit cards • with 1.9M different merchants • Fraud Rates Frauds • CP: less than 0.01%, depending on segment • CnP: less than 0.15%, depending on segment #StrataData - Predicting real-time transaction fraud using supervised learning 1 0
Development – Datasets Debit 14 months CP CnP Train OOT Train OOT 12 months 2 recent mnths 12 months 2 recent mnths Sample Sample 45M transactions 55M transactions #StrataData - Predicting real-time transaction fraud using supervised learning 1 1
Data Processing – Feature Engineering and many more (e.g. merchant)… finally, ratios between values and current transaction. #StrataData - Predicting real-time transaction fraud using supervised learning 1 2
Development – Feature Selection • Remove zero or extremely low variance 20k • Remove if all or extremely high level of missing values Univariate • Very low Information Value or Spearman rank correction 10k • Lasso co-efficient importance • Random Forest feature importance Model 1k • Recursive Feature Elimination Wrapper 500 • Business Review • Implementation Considerations Domain #StrataData - Predicting real-time transaction fraud using supervised learning 1 3
Development – Feature Selection & Business Review Debit CP model feature: ratio of current transaction amount and maximum contactless in last X days Genuine Fraud #StrataData - Predicting real-time transaction fraud using supervised learning 1 4
Development – Model Development Cycle Select Features Pick Model Review and Train Evaluate Optimize #StrataData - Predicting real-time transaction fraud using supervised learning 1 5
Development – Hyper-parameter Optimization Example of Bayesian hyper-parameter optimization using hyper-opt Performance Number of Iterations Fraction of Features in a Split #StrataData - Predicting real-time transaction fraud using supervised learning 1 6
Validation – CP Model Performance Precision Recall Curve: AUC ~0.23 ROC Curve: AUC ~0.95 #StrataData - Predicting real-time transaction fraud using supervised learning 1 7
Validation – CP Model Performance Transaction Detection Rate Value Detection Rate New model New model Incumbent model Incumbent model False Positive Rate [bps] False Positive Rate [bps] #StrataData - Predicting real-time transaction fraud using supervised learning 1 8
Validation – CnP Model Performance Transaction Detection Rate Value Detection Rate New model New model Incumbent model Incumbent model False Positive Rate [bps] False Positive Rate [bps] #StrataData - Predicting real-time transaction fraud using supervised learning 1 9
Validation – CP Model Interrogation Chip used Chip not-used Fraud Risk Time since a new card was issued #StrataData - Predicting real-time transaction fraud using supervised learning 2 0
Implementation – Development Artefacts Model Artefacts • Model Specification (JSON) • Model File (txt) • Validation Data (parquet) Feature Artefacts • Feature Specification (JSON) • Validation Data (parquet) #StrataData - Predicting real-time transaction fraud using supervised learning 2 1
Implementation - Process Artefacts from Nexus using Jenkins Model File and Implementation Validation Feature Code Gen and Validation Feature Maturation and Shadow Operations Production Validation and Go-Live #StrataData - Predicting real-time transaction fraud using supervised learning 2 2
Summary • Increasing number of customers become victims of fraud, especially remote purchase (e.g. e-commerce). • To improve fraud prevention and customer experience, we undertook – Development of generation 1 models for Debit CP and CnP using tree ensemble algorithms – 12 months of training data were converted to about 20k features to develop the best possible models – Both models are in implementation, shadow operations due in May with go-live during summer • R&D for generation 2 models (e.g. RNNs, autoencoders) on- going, promising results, but implementation requires more work… #StrataData - Predicting real-time transaction fraud using supervised learning 2 3
Rate today’s session Session page on conference website O’Reilly Events App #StrataData - Predicting real-time transaction fraud using supervised learning
Recommend
More recommend