Machine Learning and Fraud Detection February 2020 Tamsin Crossland - PowerPoint PPT Presentation

Machine Learning and Fraud Detection February 2020 Tamsin Crossland – Senior Architect @CrosslandTamsin World Class Payment and Enterprise Solutions for the global financial sector

Two main types of article on AI 2

Machine Learning and Fraud Detection • Payments • Demonstration • Thoughts 3

The increasing scale, diversity, and complexity of fraud. • Vulnerabilities in payments services have increased as the shift to digital and mobile customer platforms accelerates. 6

The increasing scale, diversity, and complexity of fraud. • New solutions have also led to payments transactions being executed more quickly, leaving banks and processors with less time to identify, counteract, and recover the underlying funds when necessary. 7

The increasing scale, diversity, and complexity of fraud. • The sophistication of fraud has increased: • greater collaboration among bad actors, including: the exchange of stolen data, new techniques, and expertise on the dark web. 8

The fraud threat facing banks and payments firms has grown dramatically in recent years. Estimates of fraud’s impact on consumers and financial institutions vary significantly but losses to banks alone are conservatively estimated to exceed $31 billion globally by 2018. 9

Instant Payments

Rule Based Systems Example: if a credit card transaction is more than ten times larger than the average for this customer Allow the human experts to apply their subject matter expertise. Difficult and time-consuming to implement well. Includes the painstaking definition of every single rule for anomaly possible If experts make an omission, undetected anomalies will happen and nobody will suspect it. Today, legacy systems apply about 300 different rules on average to approve a transaction 11

Neural Network 12

Weights and Biases 13

Training 14

Training Fraudulent Transaction Fraud Non-Fraudulent Fraud Transaction 15

Use Case 16

Rule Based versus Machine Learning Rule Based Machine Learning Catches obvious fraudulent scenarios Finds hidden correlations in data Large amount of manual work to enumerate all Automatic detection of possible fraud scenarios possible detection rules Easier to explain More difficult to explain 17

Install Tensorflow 19

Install Libraries data mining and data analysis Data Analysis winpty docker exec -i -t 07a24f61e7b6 bash pip install pandas pip install -U scikit-learn 20

Contains two days worth of credit card transactions made in September 2013 by European cardholders. 492 frauds out of 284,807 transactions (0.172%). Contains only numerical input variables which are the result of a Principal Component Analysis transformation (a method of extracting relevant information from confusing data sets). Due to confidentiality issues, cannot provide the original features and more background information about the data. 21

Features V1, V2, ... V28 are the principal components obtained with PCA The feature 'Amount' is the transaction Amount Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The only features which have not been transformed with PCA are 'Time' and 'Amount’ Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. 22

Demonstration 1 23

Balance data 25

49% 26

Data Loss 284315 -> 492 27

Attempt 3 Janio Martinez Bachmann 29

Libraries a library for making statistical graphics in Python. Toolbox for imbalanced dataset in machine learning. 30

Underfitting and Overfitting 32

Overfitting 33

Outliers 34

Principal Component Analysis 35

Demonstration 2 36

Scale time and amount 37

Random under-sampling 38

Correlation Matrix Used to show which features heavily influence whether a transaction is a fraud 40

Anomaly detection 42

After implementing outlier reduction our accuracy has been improved by over 3% ! Some outliers can distort the accuracy of our models but remember, we have to avoid an extreme amount of information loss or else our model runs the risk of underfitting. 44

Dimensionality Reduction and Clustering 45

Dimensionality Reduction and Clustering t-SNE takes a high-dimensional dataset and reduces it to a low-dimensional graph whilst still retaining a lot of the information. 46

SMOTE Synthetic Minority Over-sampling Technique Solving the Class Imbalance: SMOTE creates synthetic points from the minority class in order to reach an equal balance between the minority and majority class. Location of the synthetic points: SMOTE picks the distance between the closest neighbors of the minority class, in between these distances it creates synthetic points. Final Effect: More information is retained since we didn't have to delete any rows unlike in random undersampling. 47

Compile the model The following example uses accuracy , the fraction of the transactions that are correctly classified. optimizers shape and mold your model into its most accurate possible form by futzing with the weights. The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction. 48

Confusion Matrix Predicted: no Predicted: Yes Actual: no True negative False positive Actual: yes False negative True positive 50

Predicted: no Predicted: Yes Actual: no True negative False positive Actual: yes False negative True positive 51

Using SMOTE 52

Unsupervised Learning

Iris Data Set • 50 samples from each of three species of Iris . • Four features were measured from each sample: • the length and the width of the sepals and petals, in centimeters. • the objective of K-means is simple: • group similar data points together and discover underlying patterns. • To achieve this objective, K-means looks for a fixed number ( k ) of clusters in a dataset. 57

KMeans K -means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K . The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. 58

Demonstration 3 59

Two Questions Every Machine Learning Project Should Ask Is the purpose of the project ethical? Is the implementation of the project ethical? @CrosslandTamsin 63

Two Questions Every Machine Learning Project Should Ask Is the purpose of the project ethical? what are the additional benefits of the project? who does it benefit? 64

Is the purpose of the project ethical? 65

Two Questions Every Machine Learning Project Should Ask Is the implementation of the project ethical? Does it implement unfair bias? Disclose to stakeholders about their interactions with an AI Governance: • secure, • reliable and robust, and • appropriate processes are in place to ensure responsibility and accountability for those AI systems 66

Is the implementation of the project ethical? 67

One last thing Is it Intelligent? 68

Fraudulent Transaction Fraud Non-Fraudulent Transaction @CrosslandTamsin 69

Machine Learning and Fraud Detection February 2020 Tamsin Crossland - PowerPoint PPT Presentation

Machine Learning and Fraud Detection February 2020 Tamsin Crossland Senior Architect @CrosslandTamsin World Class Payment and Enterprise Solutions for the global financial sector Two main types of article on AI 2 Machine Learning and

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc CREDIT CARD FRAUD

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Medicare ACOs: Fraud and Abuse Perspectives This webinar is brought to you by the Fraud &

UNDERSTANDING NEUTRON STARS THROUGH GRAVITATIONAL-WAVE OBSERVATIONS Team DEPARTMENT OF

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki,

GENIUS : A tool for classifying and modelling evolution of urban typologies Marion BONHOMME 1 ,

303(d) Listing Methodology September 25, 2012 Application of the Integrated Impact Analysis Tool

Increasing the Insight from Network Flows - Connecting Science to Operational Reality Grant Babb

How does it apply to my project? Guillaume Labilloy Center for Data Solutions 03.03.2020 Agenda

Phase-field Modeling of Hydride Reorientation in Zirconium Cladding Materials under Applied Stress

Assessing the use of Google Trends to predict credit developments* E. Burdeau, E. Kintzler

Machine Learning and Fraud Detection February 2020 Tamsin Crossland - PowerPoint PPT Presentation

Machine Learning and Fraud Detection February 2020 Tamsin Crossland Senior Architect @CrosslandTamsin World Class Payment and Enterprise Solutions for the global financial sector Two main types of article on AI 2 Machine Learning and

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction &amp; Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

(Machine) Learning To Detect Fraudsters Hany Elemary Sarah LeBlanc CREDIT CARD FRAUD

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Medicare ACOs: Fraud and Abuse Perspectives This webinar is brought to you by the Fraud &amp;

UNDERSTANDING NEUTRON STARS THROUGH GRAVITATIONAL-WAVE OBSERVATIONS Team DEPARTMENT OF

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki,

GENIUS : A tool for classifying and modelling evolution of urban typologies Marion BONHOMME 1 ,

303(d) Listing Methodology September 25, 2012 Application of the Integrated Impact Analysis Tool

Increasing the Insight from Network Flows - Connecting Science to Operational Reality Grant Babb

How does it apply to my project? Guillaume Labilloy Center for Data Solutions 03.03.2020 Agenda

Phase-field Modeling of Hydride Reorientation in Zirconium Cladding Materials under Applied Stress

Assessing the use of Google Trends to predict credit developments* E. Burdeau, E. Kintzler

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Medicare ACOs: Fraud and Abuse Perspectives This webinar is brought to you by the Fraud &