Dagstuhl: Cybersafety in Modern Online Social Networks Combating Friend Spam Using Social Rejections Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and 1 K. Munagala at Duke University
Friend Spam in online social networks (OSNs) § Friend spam: unwanted friend requests Ø Degrades user experience (e.g., annoying) Ø Introduces false OSN links Fake account 2
False OSN links are harmful § Pollute the underlying social graph Ø Detrimental to social search and online ad targeting Ø Jeopardize online privacy and safety 3
False OSN links undermine the effectiveness of Sybil defense § The defense relies on genuine social links Ø SybilLimit [S&P’08], SybilInfer [NDSS’09], SybilRank [NSDI’12] Ø # undetected Sybils (fake accounts) is bounded to O(log |V| ) per link between Sybils and legitimate users OSN links Non-Sybil region Sybil region 4
Existing counter-measures § Privacy settings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN § Spam request filtering using machine learning (ML) Ø Facebook Immune System (SNS’11) Ø Individual user features are manipublable 5
Rejecto: Combating friend spam using social rejections 6
Observation: the cost of connecting to real users § False OSN links come with social rejections Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted Many rejections . . . Friend spammers 7 Legitimate users
Live fake accounts in the wild § Each has a significant number of pending requests Ø Fake Facebook accounts from an underground market Ø More measurement results in the paper 120 Number of requests Pending requests 100 Friends 80 60 40 20 0 0 10 20 30 40 Anonymized fake account ID 8
How reliable is social rejection? § Attackers inevitably trigger rejections Ø Disproportionally large number of accounts and requests Ø Requests inevitably hit cautious users § Rejection towards innocent users is non-manipulable Ø A rejection is guarded by a feedback loop between the request sender and the receiver Ø Legitimate users rarely receive rejections Ø Fundamentally different from negative ratings on online services (e.g., YouTube) 9
Challenges in using social rejection § Attack strategies Ø Collusion: fake accounts collude to accept requests Ø Arbitrarily boost the request acceptance rate of an individual account Ø Self-rejection: mimic legitimate users rejecting others Ø Whitewash the fake accounts that rejecti fake accounts § System challenge Ø Enormous user base with a large number requests and rejections 10
Rejecto in a nutshell § A strategy-proof formulation Ø Graph cut on a rejection-augmented social graph Ø Low aggregate acceptance rate of the requests from spammers to legitimate users § An effective and near-linear algorithm Ø Based on the Kernighan-Lin (KL) algorithm [The Bell System Technical Journal, 1970] § A scalable implementation Ø Layered on top of Apache Spark [Zaharia et al. NSDI’12] 11
Rejecto’s formulation of spammer detection § Main idea: put spamming accounts into groups Fake accounts cannot arbitrarily improve their AAR § Aggregate acceptance rate (AAR) ( ) F H , S ! " ( ) + R ( ) F H , S H , S H S 12
Spam requests lead to a low aggregate acceptance rate § Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted A small AAR ratio cut 13
A graph cut model § Augments a social graph with rejections Ø Directed rejection edges § Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections § Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 14
Immune to attack strategies § Collusion among spammers cannot improve MAAR Ø MAAR does not count the colluding links § Self-rejection only exposes the part of rejected accounts earlier to Rejecto Ø Iteratively identify groups of spamming accounts 15
Finding the MAAR cut is challenging § The MAAR cut is NP-hard Ø Reduced from MIN-RATIO-CUT problem [Leighton & Rao, JACM’79] Ø Detailed reduction in the paper 16
Existing work on cut-based problems in undirected graphs § State of the art: O( log |V| ) approximation algorithms with a complexity of O(|V| 2 ) Ø Summarized by Madry [FOCS’10] § Shortcomings in the OSN context Ø The approximation factor O(log |V|) is too loose Ø O(|V| 2 ) complexity is prohibitive Ø Do not support parallel graph processing 17
Our approach: an effective and efficient search algorithm § Finds a MAAR cut by interchanging misplaced nodes Ø Based on the Kernighan-Lin (KL) algorithm Ø O(|V|) complexity Ø Can scale up to multimillion-node social graphs 18
A primer on the Kernighan-Lin (KL) algorithm § Searches a balanced cut in undirected graphs Ø Minimizes #cross-partition edges Ø Reduces cross-partition edges by swapping nodes Ø Fudiccia et al. improved to O(|V|) [DAC’82] § How to use KL to find the MAAR cut? Ø Widely used in VLSI layout design Ø Additional directed rejection edges Ø Non-linear MAAR objective function U V-U … … … … … … 19
Transforming the MAAR cut problem § Convert to a set of bipartition problems Ø Each with a linear objective function Ø Rejection and social links can be unified ( ) F V − S , S ! " ! " ( ) − k × R ( ) F V − S , S V − S , S ( ) + R ( ) F V − S , S V − S , S Solvable by KL after unifying the rejections and OSN links according to the parameter k 20
Putting it all together § Transform to a family of bipartition problems each ! " minimizing ( ) − k × R F U , U U , U Ø Iterate k through a geometric sequence to cover the MAAR cut ratio k * § Extend KL to solve each of the converted problem Ø Unify rejections and social links using k Ø Detailed algorithm in the paper § Pick the cut with the lowest aggregate ratio 21
Evaluation § Extensive simulations on real social networks Ø Sensitivity analysis Ø Resilience to attack strategies Ø Compared to VoteTrust § Simulations under Sybil attack Ø In-depth defense with social-graph-based Sybil defense § A Rejecto prototype on an Amazon EC2 cluster Ø Performance analysis on large graph processing 22
Rejecto is insensitive to the spam request volume § Request flooding attacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Rejecto uncovers fakes behind the active spamming ones 1 1 Precision/recall Precision/recall Rejecto 0.8 0.9 Rejecto VoteTrust 0.6 VoteTrust 0.8 0.4 0.7 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Number of requests per fake account All fake accounts send out Only half of the fake accounts spam requests send out spam requests 23
Rejecto is resilient to attack strategies § Our MAAR cut model is immune to manipulation Self-rejection strategy to let Collusion strategy to form half of the fakes reject the dense connections among rest like legitimate users do fake accounts 1 1 Precision/recall Precision/recall 0.8 0.8 0.6 Rejecto 0.6 Rejecto 0.4 VoteTrust 0.4 VoteTrust 0.2 0.2 0 0 5 10 15 20 25 30 35 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # of non-attack edges per fake account Self-rejection rate among fake accounts 24
Rejecto and social-graph-based Sybil detection form a defense in depth § Rejecto makes it hard for fakes to get additional links Ø Defense in depth with SybilRank Area under the ROC curve 1 Facebook ca-AstroPh Improvement 0.8 0.6 0.4 1000 2000 3000 4000 5000 Number of accounts removed by Rejecto 25
Rejecto can handle multimillion-user social graphs § Performance on an EC2 cluster Ø Spark 0.9.2 The execution time grows gracefully Ø 5 c3.8xlarge VMs with the graph size Ø A larger cluster yields better performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execution 288 sec 669 sec 1767 sec 8049 sec 7.7 hours time 26
Conclusion Rejecto: uncovers friend spammers using social rejections Ø Immune to attack strategies Ø Efficient Ø Scalable 27
Recommend
More recommend