catchsync catching synchronized behavior in large
play

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS - PowerPoint PPT Presentation

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 NYC, USA 2 Fraud Detection:


  1. CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA

  2. 2 Fraud Detection: Graph Analysis Problem [www.buyfollowz.org] [buymorelikes.com]

  3. 3 Fraud Detection: Graph Analysis Problem [buycheaplikes.com] [reviewsteria.com]

  4. 4 Our Goals • Given: A graph (large-scale, directed, etc.) • Find: Frauds = Anomalous edges • Goals: • G1. Find patterns that distinguish fraudsters from normal users • G2. Design algorithms that catch fraudsters

  5. 5 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments

  6. 6 Anomalies in Degree Distributions • Power-law distribution DBLP Flickr Twitter Author -publication User -user Who -follows-whom [konect.uni-koblenz.de/networks/]

  7. 7 Anomalies in Degree Distributions 2009 3.17M 0.41M 41M d=20

  8. 8 Linear Classifier with “Degree”: Fail =20? +1 3.17M (Fraud) 0.41M Label Out-degree (+1,-1) d=20 classifier ×

  9. 9 Graph Structure Distorted 2011 1.91M 117M 0.44M d=64

  10. 10 Traditional Fraud Detection Big? Small? Big? Big? Big? +1 (Fraud) Label Out-degree In-degree #tweet #url in #hashtag (+1,-1) tweets in tweets Content-based features classifier

  11. 11 Empty Profile?

  12. 12 Few Followers?

  13. 13 Many Followings?

  14. 14 Content: Unavailable? Look Normal? 0, 0, 0… sorry Label Out-degree In-degree #tweet #url in #hashtag (+1,-1) tweets in tweets Content-based features classifier

  15. 15 Behavior is the Key Monetary Incentive Content Behavior/ Links what they what they appear to have to behave behave

  16. 16 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments

  17. 17 Behavior-based Features Follower Followee behavior behavior ≈ ≈ Out-degree In-degree 1 st left singular vector 1 st right singular vector (Hubness) (Authoritativeness) 2 nd left singular vector 2 nd right singular vector … …

  18. 18 Behavior-based Feature Space Follower Followee behavior behavior

  19. 19 Fraudulent Behavior Patterns

  20. 20 Fraudulent Behavior Patterns

  21. 21 Fraudulent Behavior Patterns

  22. 22 Fraudulent Behavior Patterns

  23. 23 Fraudulent Behavior Patterns

  24. 24 Fraudulent Behavior Patterns • Synchronized • Abnormal

  25. 25 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments

  26. 26 Synchronicity and Normality • Synchronicity

  27. 27 Synchronicity and Normality • Normality

  28. 28 Synchronicity-Normality Plot

  29. 29 Theorem • For any distribution, there is a parabolic lower limit in the synchronicity-normality plot. synchronicity normality • Proof. See our paper 

  30. 30 CatchSync Algorithm • Distance-based anomaly detection • Fraudsters • Big synchronicity • Small normality • Away from the densest

  31. 31 OUTLINE 1. Background 2. Fraudulent Pattern Mining 3. The Algorithm 4. Experiments

  32. 32 Experiments • Q1: Does CatchSync remove anomalies? • Degree distribution • Feature space • Q2: Is CatchSync catching actually fraudulent users? • Q3: Is CatchSync robust?

  33. 33 Q1: Does CatchSync Remove Anomalies? 2009 3.17M 41M 0.41M d=20

  34. 34 Q1: Does CatchSync Remove Anomalies? 2011 117M

  35. 35 Before CatchSync Follower Followee behavior behavior

  36. 36 After CatchSync Follower Followee behavior behavior

  37. 37 Q2: Is CatchSync Catching Actually Fraudulent Users? 173/1,000 237/1,000

  38. 38 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync 0.813 +SPOT CatchSync 0.751 0.597 SPOT OutRank 0.412 0 0.2 0.4 0.6 0.8 1

  39. 39 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync 0.785 +SPOT CatchSync 0.694 0.653 SPOT OutRank 0.377 0 0.2 0.4 0.6 0.8 1

  40. 40 Q2: Is CatchSync Catching Actually Fraudulent Users? Recall = 80% Precision in Twitter Precision in Tencent Weibo 83.5% 79.4%

  41. 41 Q3: Is CatchSync Robust to Camouflage? Target Popular camouflage Random camouflage

  42. 42 Q3: Is CatchSync Robust to Camouflage?

  43. 43 Q3: Is CatchSync Robust to Camouflage?

  44. 44 Q3: Is CatchSync Robust to Camouflage? Popular Random camouflage camouflage

  45. 45 Conclusion • Goals • G1. Find patterns that distinguish fraudulent user behavior from normal behavior • A1: Synchronized & Abnormal! • G2. Design algorithms that catch fraudsters • A2: CatchSync! • Remove spikes • Content free • Robust to camouflage

  46. 46 Questions? Meng Jiang mjiang89@gmail.com http://www.meng-jiang.com

Recommend


More recommend