Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 - PowerPoint PPT Presentation

Two Step Graph-based Semi-supervised Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 , Hayato Kobayashi 2 , Nobuyuki Shimizu 2 , Satoshi Yamauchi 2 , and Tsuyoshi Murata 1 1 Tokyo Institute of Technology, 2 Yahoo Japan Corporation

2 Definition of Fraudster Competitive Shilling auction users who bid on their product, as other user IDs, in order to drive up the final price. ID1 Sell € € € ID2 Bid Product Fraudster Online auction website

3 Key Ideas Fraudsters Innocents rarely interact with frequently participate fraudsters in auctions hosted by fraudulent sellers frequently interact with working in a same famous sellers group or uniformly interact with various sellers

4 Key Ideas Fraudsters Innocents rarely interact with frequently participate fraudsters in auctions hosted by fraudulent sellers frequently interact with working in a same famous sellers group or uniformly interact with U Homophily H various sellers

5 Contributions 1. Novel application of Modified Adsoprtion (MAD) [Talukdar & Crammer, ECMLPKDD’09] – Have been previously used in NLP – Homophily : smoothness constraint H – Uniformity of innocents : dummy label U 2. Incorporate weighted degree centrality – Fraudsters tend to form very strong ties. – Help us to yield better results

6 Overview Input unlabeled Whitelisted graph Blacklisted Input IDs IDs (product, seller, bidder) ? ? ? ? ? Graph Initial Label Auction Construction Assignment Transaction ? ? ? Least Most suspicious suspicious … Fraud Modified Scoring Adsorption - + # ? Output ? soft labels ordered list of users matrix ? Objective: Fraudsters working in the same collusion with the blacklisted users are ranked at the top.

7 Graph Construction Product Seller Bidder User #Product P1 A B |{P1, P3}|=2 P1 A C A B P2 A C W AC P3 B A |{P3}|=1 P3 B C =|{P1, P2}| P3 B C =2 C P3 B C Online auction Weighted undirected transaction graph

8 Graph-based SSL Modified Adsorption (MAD) [Talukdar & Crammer ,’09] is used. Input : partially labeled Output : soft label matrix weighted undirected graph Dummy Whitelisted - + # Blacklisted label node node ? No enough information |Nodes| ? U ? ? … ? ? Unlabeled node |Possible Labels|+1 Assign a score indicating likelihood of Node: instance that want to classify being each label (soft labels) Edge: similarity between instances

9 Dummy Label • Exceptional case of all other labels Entropy Amount of uncertainty Neighbors of vertex v Weighted degree of vertex v U The score of dummy is high when the vertex uniformly interacts with its neighbors.

10 Modified Absorption (MAD) Tradeoff between fitting and smoothness constraints - Fitting : retain initial labels of seed nodes - Smoothness : assign same labels to adjacent nodes H Solving the convex optimization problem Fitting Smoothness Regularization where is a matrix storing scores of labels (soft label matrix) Y stores seed information S indicates positions of seed vertices L is the Laplacian matrix R encodes scores of the dummy label and L 2 regularization.

11 Overview (2) unlabeled Whitelisted graph Blacklisted Input IDs IDs (product, seller, bidder) ? ? ? ? ? Graph Initial Label Auction Construction Assignment Transaction ? ? ? Least Most suspicious suspicious … Fraud Modified Scoring Adsorption - + # ? Output ? soft labels ordered list of users matrix ? Objective: Fraudsters working in the same collusion with the blacklisted users are ranked at the top.

12 Fraud Scoring Output: fraud score of nodes Input : soft label matrix - + # MAD |Nodes| … The ratio of Bad ’s score to total scores Bad, Good, Dummy

13 Contributions 1. Novel application of Modified Adsoprtion (MAD) [Talukdar & Crammer, ECMLPKDD’09] – H omophily : smoothness constraint H – U niform interaction of innocents: dummy label U 2. Incorporate weighted degree centrality (WDC) – Fraudsters form very strong ties.

14 Weighted Degree Centrality (WDC) W eighted degree centrality of vertex v is the total weights of edges originating from v 3 v 1 2 Weight of an Neighbors of v edge ( u , v ) k w ( v ) = 6 Fraudsters tend to have higher weighted degree centralities because of stronger ties . H

15 Fraud Scoring + WDC Output: fraud score of nodes Input : soft label matrix - + # 2-STEP |Nodes| Weight of an Neighbors of edge ( u,v ) vertex v … Bad, Good, Dummy MAD

16 Experiments • Questions 1. Does the dummy label help? 2. Comparison with unsupervised methods 3. Comparison with a state-of-the-art Sybil defense method • Evaluation metric Used normalized discounted cumulative gain (NDCG) to compare results with the blacklisted users Higher NDCG is better.

17 Dataset • Real-world dataset from YAHUOKU 1 – The largest online auction site in Japan – Operated by Yahoo! Japan • Auction transaction All ≈ 16 million transactions ≈ 2 million users Seller Mixe Bidder ≈ 550 blacklisted users d ≈ 10,000 whitelisted users 1 auctions.yahoo.co.jp/

18 With VS Without Dummy Label with dummy w/o dummy Node type <NDCG> SD <NDCG> SD All 0.431 0.015 0.406 0.019 Bidder 0.423 0.026 0.397 0.035 Seller 0.336 0.049 0.284 0.029 Mixed 0.374 0.044 0.319 0.024 • Dummy label has a true advantage. • Support the key idea that innocents tend to interact with neighbors uniformly U

19 Proposed VS Unsupervised Compare with All Bidder 1) Weighted degree centrality (WDC) 2) Eigenvector centrality (Eigen. C.) 2-STEP method outperforms MAD. Mixed Seller Unsupervised methods yield poor results. Fraudulent sellers are more difficult.

20 Sybil Defense Method • Sybil: malicious attackers who – create multiple identities – influence working of systems • Shill bidders are one type of Sybil • We compared our method with a state-of-the- art Sybil defense method [Viswanath et al., SIGCOMM’10] – On basis of community detection

21 Proposed VS Sybil All All Sybil Sybil Calculated from top 100 Calculated from top 500 • Our method outperforms the state-of-the-art Sybil defense method. • Fraudsters and innocents may not form well- established communities.

22 Conclusion • Proposed an online auction fraud detection approach • Motivated by two main ideas – Uniformity of innocents U – Homophily H − Fraudsters tend to have higher WDCs. • Incorporated WDC to the method • Our extended method yields better results.

Thank you

24 Future Works • Study limitation of the method • Incorporate other heuristics – Bidding strategy – Value of products • Extend the method to heterogeneous network Homogeneous network Heterogeneous network

25 Scalability • The optimization process of MAD can be parallelized in MapReduce framework. – Map: sends its current label to neighbors – Reduce: update its label information • Hadoop-based implementation is available. – Junto Label Propagation Toolkit: https://github.com/parthatalukdar/junto/

Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 - PowerPoint PPT Presentation

Two Step Graph-based Semi-supervised Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 , Hayato Kobayashi 2 , Nobuyuki Shimizu 2 , Satoshi Yamauchi 2 , and Tsuyoshi Murata 1 1 Tokyo Institute of Technology, 2 Yahoo Japan

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Auction Overview 106 single residential blocks for sale by Public Auction. Auction schedule

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Onyx Fraud is rife online Online fraud aggregated 10bn in 2017 Fraud costs are increasing, due

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Variable Rate Debt Options: Auction Rate Securities Auction Rate Securities What are Auction Rate

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Dont distract me while I am winning this auction! The psychology of auction fraud David Modic

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Evaluation of CMAQ SOA during CALNEX g with Consideration of Volatility Space Annmarie G Carlton

Electron Density-Based Machine Learning for Accelerating Quantum Calculations Joshua Lansford and

New Advances for Fischer Tropsch Catalysis Office of Research and Development Christopher

Calculation of X-ray adsorption spectra at finite temperature: spectral signature of H-bond

PREDICT Briefing Slides for Importers and Entry Filers FDA Office of Regulatory Affairs FDA

MOL2NET Physicochemical analysis of honey samples produced in Paraba (Brazil) Maysa Dayane G.

Metabolomics-based approaches on wine authentication: a review with case studies Rebeca Souto

FSMA Compliance 2016 Carrier Concerns DOJ and FDA Say Criminal Prosecution of Food Companies is

Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 - PowerPoint PPT Presentation

Two Step Graph-based Semi-supervised Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 , Hayato Kobayashi 2 , Nobuyuki Shimizu 2 , Satoshi Yamauchi 2 , and Tsuyoshi Murata 1 1 Tokyo Institute of Technology, 2 Yahoo Japan

Fraud Overview Agenda Fraud Overview Fraud Triangle and Red Flags Fraud Prevention

Using text data to detect fraud Charlotte Werger Data Scientist DataCamp Fraud Detection in

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

The Fraud Indicator in the UK Professor Mark Button Centre for Counter Fraud Studies Outline of

Auction Overview 106 single residential blocks for sale by Public Auction. Auction schedule

Introduction &amp; Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud

Fraud Prevention: The Prevention and Detection of Fraud Begins with You Takeaways What is

The F word: FRAUD Agenda About Internal Audit Audit team Internal Audit office overview

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Onyx Fraud is rife online Online fraud aggregated 10bn in 2017 Fraud costs are increasing, due

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

Variable Rate Debt Options: Auction Rate Securities Auction Rate Securities What are Auction Rate

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Risky Business: How Companies Fall Victim to Fraud Presented by: Tony Okray Julie Latchaw

Dont distract me while I am winning this auction! The psychology of auction fraud David Modic

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Evaluation of CMAQ SOA during CALNEX g with Consideration of Volatility Space Annmarie G Carlton

Electron Density-Based Machine Learning for Accelerating Quantum Calculations Joshua Lansford and

New Advances for Fischer Tropsch Catalysis Office of Research and Development Christopher

Calculation of X-ray adsorption spectra at finite temperature: spectral signature of H-bond

PREDICT Briefing Slides for Importers and Entry Filers FDA Office of Regulatory Affairs FDA

MOL2NET Physicochemical analysis of honey samples produced in Paraba (Brazil) Maysa Dayane G.

Metabolomics-based approaches on wine authentication: a review with case studies Rebeca Souto

FSMA Compliance 2016 Carrier Concerns DOJ and FDA Say Criminal Prosecution of Food Companies is

Introduction & Motivation Bart Baesens Professor Data Science at KU Leuven DataCamp Fraud