Introduction to fraud detection Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Introduction to fraud detection Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Meet your instructor Hi my name is Charlotte and I am a Data Scientist

DataCamp Fraud Detection in Python What is Fraud? Examples of fraud: insurance fraud, credit card fraud, identify theft, money laundering, tax evasion, product warranty, healthcare fraud Fraud is uncommon concealed changing over time organized

DataCamp Fraud Detection in Python Fraud detection is challenging

DataCamp Fraud Detection in Python How companies deal with fraud Fraud analytics teams: 1. Often use rules based systems, based on manually set thresholds and experience 2. Check the news 3. Receive external lists of fraudulent accounts and names 4. Sometimes use machine learning algorithms to detect fraud or suspicious behaviour

DataCamp Fraud Detection in Python Let's have a look at some data df=pd.read_csv('creditcard_data.csv') df.head() V1 V2 ... Amount Class 0 -0.078306 0.025427 ... 1.77 0 1 0.000531 0.019911 ... 30.90 0 2 0.015375 -0.038491 ... 23.57 0 3 0.137096 -0.249694 ... 13.99 0 4 -0.014937 0.005771 ... 1.29 0 df.shape (5050, 30)

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Let's practice!

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Increasing succesfull detections using data resampling Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Undersampling

DataCamp Fraud Detection in Python Oversampling

DataCamp Fraud Detection in Python Oversampling in Python from imblearn.over_sampling import RandomOverSampler method = RandomOverSampler() X_resampled, y_resampled = method.fit_sample(X, y) compare_plots(X_resampled, y_resampled, X, y)

DataCamp Fraud Detection in Python Synthetic Minority Oversampling Technique (SMOTE) Source: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced- datasets

DataCamp Fraud Detection in Python Which resampling method to use? Random Under Sampling (RUS): throw away data, computationally efficient Random Over Sampling (ROS): straightforward and simple, but training your model on many duplicates Synthetic Minority Oversampling Technique (SMOTE): more sophisticated and realistic dataset, but you are training on "fake" data

DataCamp Fraud Detection in Python When to use resampling methods Use resampling methods on your training set, never on your test set! # Define resampling method and split into train and test method = SMOTE(kind='borderline1') X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=0) # Apply resampling to the training data only X_resampled, y_resampled = method.fit_sample(X_train, y_train) # Continue fitting the model and obtain predictions model = LogisticRegression() model.fit(X_resampled, y_resampled) # Get your performance metrics predicted = model.predict(X_test) print (classification_report(y_test, predicted))

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Fraud detection algorithms in action Charlotte Werger Data Scientist

DataCamp Fraud Detection in Python Traditional fraud detection with rules based systems

DataCamp Fraud Detection in Python Drawbacks of using rules based systems Rules based systems have their limitations: 1. Fixed thresholds per rule to determine fraud 2. Limited to yes/no outcomes 3. Fail to capture interaction between features

DataCamp Fraud Detection in Python Why use machine learning for fraud detection? 1. Machine learning models adapt to the data, and thus can change over time 2. Uses all the data combined rather than a threshold per feature 3. Can give a score, rather than a yes/no 4. Will typically have a better performance and can be combined with rules

DataCamp Fraud Detection in Python Refresher on machine learning models from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn import metrics # Step 1: split your features and labels into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # Step 2: Define which model you want to use model = LinearRegression() # Step 3: Fit the model to your training data model.fit(X_train, y_train) # Step 4: Obtain model predictions from your test data y_predicted = model.predict(X_test) # Step 5: Compare y_test to predictions and obtain performance metrics print (metrics.r2_score(y_test, y_predicted)) 0.821206237313

DataCamp Fraud Detection in Python What you'll be doing in the upcoming chapters Chapter 2. Supervised learning: train a model using existing fraud labels Chapter 3. Unsupervised learning: use your data to determine what is 'suspicious' behaviour without labels Chapter 4. Fraud detection using text data: Learn how to augment your fraud detection models with text mining and topic modelling

Introduction to fraud detection Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in Python Meet your instructor Hi my name is Charlotte and I am a Data Scientist DataCamp

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Fraud Awareness Corporate Fraud Team Agenda for today Introduction to the session (Paul

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Data Mining for Potential Voter Fraud Findings and Recommendations Does voter fraud exist?

2008 Payments Conference Payments Fraud: Perception versus Reality Payments Fraud: Perception

1 Implementation Using array for implementation Define a structure to store key-value pairs

Demystifying the SEPA End Date Experian, London, 9 th of June 2011 Ruth Wandhfer Director GTS

Outline Outline Web Services Technology Stack gy ebXML SOAP SOAP Web Services

Todays Agenda 08:30 Welcome and broader context (Saman Amarasinghe) 08:40 Introduction

A Prototype for Credit Card Fraud Management Alexander Artikis 1 , 2 , Nikos Katzouris 2 , Ivo

OFFICE OF ATTORNEY GENERAL Josh Shapiro, Attorney General www.attorneygeneral.gov What Every

Class Weighted Classification: Trade-offs and Robust Approaches Ziyu Xu (Neil), Chen Dan, Justin

O ff ice Manager Luncheon March, 30 2016 Happy Doctors Day! Thank you to our Lunch Sponsors

Introduction to fraud detection Charlotte Werger Data Scientist - PowerPoint PPT Presentation

DataCamp Fraud Detection in Python FRAUD DETECTION IN PYTHON Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in Python Meet your instructor Hi my name is Charlotte and I am a Data Scientist DataCamp

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Introduction &amp; Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Fraud Awareness Corporate Fraud Team Agenda for today Introduction to the session (Paul

Catch them in the Act Fraud Detection in Real-time Seshika Fernando Technical Lead Fraud: A

Ad click fraud detection Christian Benson and Adam Thuvesen Problem Ad click fraud

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Data Mining for Potential Voter Fraud Findings and Recommendations Does voter fraud exist?

2008 Payments Conference Payments Fraud: Perception versus Reality Payments Fraud: Perception

1 Implementation Using array for implementation Define a structure to store key-value pairs

Demystifying the SEPA End Date Experian, London, 9 th of June 2011 Ruth Wandhfer Director GTS

Outline Outline Web Services Technology Stack gy ebXML SOAP SOAP Web Services

Todays Agenda 08:30 Welcome and broader context (Saman Amarasinghe) 08:40 Introduction

A Prototype for Credit Card Fraud Management Alexander Artikis 1 , 2 , Nikos Katzouris 2 , Ivo

OFFICE OF ATTORNEY GENERAL Josh Shapiro, Attorney General www.attorneygeneral.gov What Every

Class Weighted Classification: Trade-offs and Robust Approaches Ziyu Xu (Neil), Chen Dan, Justin

O ff ice Manager Luncheon March, 30 2016 Happy Doctors Day! Thank you to our Lunch Sponsors

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud