CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA
2 Fraud Detection: Graph Analysis Problem [www.buyfollowz.org] [buymorelikes.com]
3 Fraud Detection: Graph Analysis Problem [buycheaplikes.com] [reviewsteria.com]
4 Our Goals • Given: A graph (large-scale, directed, etc.) • Find: Frauds = Anomalous edges • Goals: • G1. Find patterns that distinguish fraudsters from normal users • G2. Design algorithms that catch fraudsters
5 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments
6 Anomalies in Degree Distributions • Power-law distribution DBLP Flickr Twitter Author -publication User -user Who -follows-whom [konect.uni-koblenz.de/networks/]
7 Anomalies in Degree Distributions 2009 3.17M 0.41M 41M d=20
8 Linear Classifier with “Degree”: Fail =20? +1 3.17M (Fraud) 0.41M Label Out-degree (+1,-1) d=20 classifier ×
9 Graph Structure Distorted 2011 1.91M 117M 0.44M d=64
10 Traditional Fraud Detection Big? Small? Big? Big? Big? +1 (Fraud) Label Out-degree In-degree #tweet #url in #hashtag (+1,-1) tweets in tweets Content-based features classifier
11 Empty Profile?
12 Few Followers?
13 Many Followings?
14 Content: Unavailable? Look Normal? 0, 0, 0… sorry Label Out-degree In-degree #tweet #url in #hashtag (+1,-1) tweets in tweets Content-based features classifier
15 Behavior is the Key Monetary Incentive Content Behavior/ Links what they what they appear to have to behave behave
16 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments
17 Behavior-based Features Follower Followee behavior behavior ≈ ≈ Out-degree In-degree 1 st left singular vector 1 st right singular vector (Hubness) (Authoritativeness) 2 nd left singular vector 2 nd right singular vector … …
18 Behavior-based Feature Space Follower Followee behavior behavior
19 Fraudulent Behavior Patterns
20 Fraudulent Behavior Patterns
21 Fraudulent Behavior Patterns
22 Fraudulent Behavior Patterns
23 Fraudulent Behavior Patterns
24 Fraudulent Behavior Patterns • Synchronized • Abnormal
25 OUTLINE 1. Background 2. Fraudulent Pattern 3. The Algorithm 4. Experiments
26 Synchronicity and Normality • Synchronicity
27 Synchronicity and Normality • Normality
28 Synchronicity-Normality Plot
29 Theorem • For any distribution, there is a parabolic lower limit in the synchronicity-normality plot. synchronicity normality • Proof. See our paper
30 CatchSync Algorithm • Distance-based anomaly detection • Fraudsters • Big synchronicity • Small normality • Away from the densest
31 OUTLINE 1. Background 2. Fraudulent Pattern Mining 3. The Algorithm 4. Experiments
32 Experiments • Q1: Does CatchSync remove anomalies? • Degree distribution • Feature space • Q2: Is CatchSync catching actually fraudulent users? • Q3: Is CatchSync robust?
33 Q1: Does CatchSync Remove Anomalies? 2009 3.17M 41M 0.41M d=20
34 Q1: Does CatchSync Remove Anomalies? 2011 117M
35 Before CatchSync Follower Followee behavior behavior
36 After CatchSync Follower Followee behavior behavior
37 Q2: Is CatchSync Catching Actually Fraudulent Users? 173/1,000 237/1,000
38 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync 0.813 +SPOT CatchSync 0.751 0.597 SPOT OutRank 0.412 0 0.2 0.4 0.6 0.8 1
39 Q2: Is CatchSync Catching Actually Fraudulent Users? CatchSync 0.785 +SPOT CatchSync 0.694 0.653 SPOT OutRank 0.377 0 0.2 0.4 0.6 0.8 1
40 Q2: Is CatchSync Catching Actually Fraudulent Users? Recall = 80% Precision in Twitter Precision in Tencent Weibo 83.5% 79.4%
41 Q3: Is CatchSync Robust to Camouflage? Target Popular camouflage Random camouflage
42 Q3: Is CatchSync Robust to Camouflage?
43 Q3: Is CatchSync Robust to Camouflage?
44 Q3: Is CatchSync Robust to Camouflage? Popular Random camouflage camouflage
45 Conclusion • Goals • G1. Find patterns that distinguish fraudulent user behavior from normal behavior • A1: Synchronized & Abnormal! • G2. Design algorithms that catch fraudsters • A2: CatchSync! • Remove spikes • Content free • Robust to camouflage
46 Questions? Meng Jiang mjiang89@gmail.com http://www.meng-jiang.com
Recommend
More recommend