Consequences of Compromise: Characterizing Account Hijacking on Twitter Frank Li UC Berkeley With: Kurt Thomas (UCB → Google), Chris Grier (UCB/ICSI → Databricks), Vern Paxson (UCB/ICSI)
Accounts on Social Networks • Accounts are valuable! – Precursor for abuse (spam, phishing, malware) – T witter accounts are attractive
Accounts on Social Networks • Accounts are valuable! – Precursor for abuse (spam, phishing, malware) – T witter accounts are attractive • T wo ways for attackers to get accounts: – Fraudulent accounts – Compromised accounts
Prior Works • Fraudulent accounts – Lots of prior work on detecting and preventing fake accounts • Compromise accounts – COMPA (NDSS '13) – PCA-based Anomaly Detection (USENIX Security '14)
Compromise on Social Networks • Is compromise occurring at large scales? • What do miscreants do with compromised accounts? • Who are being victimized? • How do users react to compromise? • What is causing compromise?
Detecting Compromise • We take an external perspective of T witter • Looked at 8.7B tweets with URLs gathered from Jan – Oct 2013 – 168M users in data set
Spam T weets Aweesomeee! I made $171.50 this week so far taking a couple of surveys. http://t.co/cwG67lh4 Awesome! I made $106.03 this week so far just filling out a couple of surveys. http://t.co/PoHBayLz
Meme T weets
Analysis Pipeline
Identifying Compromised Users
Identifying Compromised Users
T witter Stream Data {"created_at":"Fri Oct 10 00:00:24 +0000 2014","id":520363179210072065,"id_str":"520363179210072065","text":"White people http:\/\/t.co\/gcOd6JqKKL","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user": {"id":1912894320,"id_str":"1912894320","name":"Suck My Ass ","screen_name":"Janoskbiebs","location":"","url":null,"description":null,"protected":false,"verified":false,"followers_count":1136,"friends_count":1294,"listed_count":2,"favourites_count":2090,"statuses_count":5113,"created_at":"Sat Sep 28 03:00:09 +0000 2013","utc_offset":- 25200,"time_zone":"Arizona","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/479624239423553538\/9xMeKSoG.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/479624239423553538\/9xMeKSoG.jpeg","profile_background_tile":true,"profile_link_color":"31BF9C","profile _sidebar_border_color":"000000","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/520080781612290048\/A5pKzHGV_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/520080781612290048\/A5pKzHGV_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1912894320\/1412831879","default_profile":false ,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":489091648354549760,"id_str":"489091648354549760","indices": [14,36],"media_url":"http:\/\/pbs.twimg.com\/media\/BsmaQ0rIgAA4fMN.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/BsmaQ0rIgAA4fMN.jpg","url":"http:\/\/t.co\/gcOd6JqKKL","display_url":"pic.twitter.com\/gcOd6JqKKL","expanded_url":"http:\/\/twitter.com\/SteveMeans\/status\/489091648522301440\/photo\/1","type":"photo","sizes":{"medium":{"w":600,"h":552,"resize":"fit"},"small":{"w":340,"h":312,"resize":"fit"},"large":{"w":600,"h":552,"resize":"fit"},"thumb": {"w":150,"h":150,"resize":"crop"}},"source_status_id":489091648522301440,"source_status_id_str":"489091648522301440"}]},"extended_entities":{"media":[{"id":489091648354549760,"id_str":"489091648354549760","indices": [14,36],"media_url":"http:\/\/pbs.twimg.com\/media\/BsmaQ0rIgAA4fMN.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/BsmaQ0rIgAA4fMN.jpg","url":"http:\/\/t.co\/gcOd6JqKKL","display_url":"pic.twitter.com\/gcOd6JqKKL","expanded_url":"http:\/\/twitter.com\/SteveMeans\/status\/489091648522301440\/photo\/1","type":"photo","sizes":{"medium":{"w":600,"h":552,"resize":"fit"},"small":{"w":340,"h":312,"resize":"fit"},"large":{"w":600,"h":552,"resize":"fit"},"thumb": {"w":150,"h":150,"resize":"crop"}},"source_status_id":489091648522301440,"source_status_id_str":"489091648522301440"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en","timestamp_ms":"1412899224466"}
T witter Stream Data ● created_at (UTC, seconds) user ● ● id (>53 bits) id (>53 bits) – ● text (UTF-8, <140 char) name (<=20 char) – screen_name (<=15 char) – ● source description (<=160 char) – ● lang (machine-detected, BPC-47) protected – ● in_reply_to_status_id verified – ● in_reply_to_user_id followers_count – ● in_reply_to_screen_name friends_count – ● entities statuses_count – hashtags – created_at (UTC, seconds) – urls (both URL and domain) – lang (user self-declared, BPC-47) – user_mentions –
Infrastructure
Infrastructure Tweets from Twitter Stream
Infrastructure Tweets from Twitter Stream Upload to S3
Infrastructure Tweets from Twitter Stream Download to our cluster Upload to S3
Filtered Stream ● Access to a filtered stream of URLs ● ~200 GB of data per day, compressed to ~20 GB per day ● In total, 4.1 TB of compressed data for 2013.
Data Collection
Infrastructure Issues ● T witter feed outage ● EC2 reboot ● EC2 feed application crash ● Low disk space ● Disk failures ● Updates break things
Filtered Stream Roughly 61% of all T weets with URLs
Sampling Error ● Under-estimate size of clusters ● Any graph analysis will under-represent social connectivity
Identifying Compromised Users
Similar Content Example Aweesomeee! I made $171.50 this week so far taking a couple of surveys. http://t.co/cwG67lh4 Near duplicate text Different URL Awesome! I made $106.03 this week so far just filling out a couple of surveys. http://t.co/PoHBayLz
Clustering T weets • Cluster on same URLs • Cluster on similar content – Split text into n-grams – Want Jaccard similarity coefficient: – T o avoid O(n^2), where n = O(billion), use minhash estimation
Minhash Estimation Set A = {a1,…., aN} Set B = {b1,…, bN} •
Minhash Estimation Set A = {a1,…., aN} Set B = {b1,…, bN} • • Hash all elements: A' = {h(a1),...,h(aN)} B' = {h(b1),...,h(bN)}
Minhash Estimation Set A = {a1,…., aN} Set B = {b1,…, bN} • • Hash all elements: A' = {h(a1),...,h(aN)} B' = {h(b1),...,h(bN)} Sort hashes for each set: • A'' = {h(a3), h(a7),...} B'' = {h(b9}, h(b2),...}
Minhash Estimation Set A = {a1,…., aN} Set B = {b1,…, bN} • • Hash all elements: A' = {h(a1),...,h(aN)} B' = {h(b1),...,h(bN)} Sort hashes for each set: • A'' = {h(a3), h(a7),...} B'' = {h(b9}, h(b2),...} Key for each set is the k smallest hashes: • Key_A = h(a3)||h(a7) Key_B = h(b9)||h(b2)
Recommend
More recommend