The rise of novel Twitter social spambots SoBigData day @EUI, Florence, 11-10-2017 Marinella Petrocchi IIT-CNR, Pisa, Italy
SPAMBOTS & SOCIAL NETWORKS spambot AN OPEN PROBLEM Spambots (Semi-)automated accounts with (often) harmful intention Misinformation spreading, steal of personal data, manipulation of stock market, infiltration in political discourse
THE RISE OF THE SOCIAL BOTS They escape detection techniques, by evolving: On Twitter: fake followers (till 2012) 1 st evolution (2012-2014) current (?) wave (2015-2017) New spambots are almost indistinguishable from genuine accounts E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,” Communica)ons of the ACM , vol. 59, no. 7, pp. 96–104, 2016
FAKE FOLLOWERS
NAIVE FAKE ACCOUNTS WERE EASY TO BUY
The new wave SOCIAL SPAMBOTS
SOCIAL SPAMBOTS Undistinguishable from genuine accounts if analyzed one-by-one Analysis of the online behavior of large groups of users, with the goal of detecting possible spambots among them
The idea MODELING THE ONLINE BEHAVIOR OF USERS Behaviour Sequence of actions performed by an account Digital DNA Each type of action is associated to a character (e.g., A, B, C) The online behaviour of an account is modeled as a sequence of characters (i.e., a string, similarly to biologic DNA) according to the sequence of actions performed by that account
The idea MODELING THE ONLINE BEHAVIOR OF USERS Timeline of a Twitter account R Encoding T tweet, P R retweet, P reply R …RRTRPR T R R
DIGITAL DNA VS BIOLOGIC DNA A adenine, G guanine, T tweet, R retweet, T thymine, C cytosine P reply …RRTRPRTPRRPRTPRPTPRRTRPR …AGTCTCCATTTTCAGGTCGTA …RPRTPTTRPTRPTPRRRRTPPRPP …GTTTAAGATCGCCTCATCACC …TTTRRRPPTPRPTPRTRPTRRRTP …AGGCAATTCGCCTGAACTGG …PRTRPRTPPPPRTPRRPRTPPRRT …AGTCTCGATCCTTTCCTCGTT …TRTRPRTPRRPRTPRPTPTPPRTT …AAAATCGAACGCCTTGTCGG …ATTCTCCATCGCCTAAACAAC …TRPPRTPPTRPPTPRRTTTPPRPR
Spambots characterization SIMILARITY BETWEEN DIGITAL DNA SEQUENCES Intuition Automated accounts (spambots) have similar DNA sequences LCS (longest common substring) Longest substring between N sequences of digital DNA …TRRRPRRTRRPRTPRPTPRRTRPR …RPRTPTTRRRPRRTPRRRRTPPRP RRRPRRT …TTTRRRPRRRPRRTRTRPTRRRTP (length: 7 characters) …PRTRPRTPPPPRTPRRRRRPRRTR M. Arnold and E. Ohlebusch, “Linear Lme algorithms for generalizaLons of the longest common substring problem,” Algorithmica , vol. 60, no. 4, pp. 806–818, 2011
Spambots characterization LCS: SPAMBOTS VS HUMANS LCS: similarity measure
Spambots detection LCS: SPAMBOTS + HUMANS (MIXED GROUP) 1. accounts with high similarity 2. steep decrease in similarity 3. accounts with low similarity
Spambots detection DETECTION TECHNIQUES Unsupervised approach
Spambots detection DETECTION TECHNIQUES 2. Supervised approach
Spambots detection DATASETS Evaluation datasets: 1. Mixed1 (1982 accounts): 50% Bot1, 50% human Mixed2 (928 accounts): 50% Bot2, 50% human 2.
C. Yang, R. Harkreader, and G. Gu, “Empirical evaluaLon and new design for fighLng evolving TwiVer spammers,” IEEE Transac)ons on Informa)on Forensics and Security , vol. 8, no. 8, pp. 1280–1293, 2013 Spambots detection EVALUATION Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, “TwiVer spammer detecLon using data stream clustering,” Informa)on Sciences , vol. 260, pp. 64– 73, 2014 F. Ahmed, and M. Abulaish, “A generic staLsLcal approach for spam detecLon in online social networks,” Computer Communica)ons , vol. 36, no. 10, pp. 1120–1129, 2013
TAKE-HOME MESSAGES • New evolutionary wave: social spambots • Current techniques fail in detecting them • Detection via digital DNA analysis: effective and efficient (lightweight features – no graphs – linear complexity algorithms) Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “The Paradigm Shi? of social spambots: Evidence, theories, and tools for the arms race”, WWW 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “ Social Fingerprin)ng: Detec)on of spambots groups thorugh DNA inspired behavioral modeling” IEEE TransacLons on Dependable and Secure CompuLng, 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “ExploiLng digital DNA for the analysis of similariLes in TwiVer behaviours” IEEE Data Science and AnalyLcs, 2017
Questions? THANK YOU! Marinella Petrocchi marinella.petrocchi@iit.cnr.it http://mib.projects.iit.cnr.it/dataset.html
Recommend
More recommend