FilterMap: Measuring Censorship Filters at Global Scale Ram Sundara Raman 1 , Adrian Stoll 1 , Jakub Dalek 2 , Reethika Ramesh 1 , Will Scott 3 , Roya Ensafi 1 University of Michigan 1 , The Citizen Lab 2 , Independent 3 24 February 2020
Content Filtering Technologies Filters, DPIs, middleboxes ● Dual Use Technology ● Intended use - Security ○ Side effect - Censorship, surveillance ○ Commoditization of filters - High availability, low cost, and ● advanced features Very little, but important, information on use of filters ● 2
Netsweeper and Citizen Lab Netsweeper - Canadian filter vendor - Provides carrier ● grade filtering, dynamic categorization of websites Citizen Lab conducted investigations of use of Netsweeper ● products over several years “Alternative Lifestyles” category used by UAE, others to ● block LGBTQ content Netsweeper removed the option to block category ● 3
Auditing filters can drive change! 4
Proliferation of Filters 5
Previous Work Biased towards few, well-known filters ● Significant manual effort ● Physical access ○ In-country collaborators ○ 6
Filters respond with blockpages ● Blockpages Rich with information ● Trademark of the ○ manufacturing vendor Identity of the deploying actor ○ Use blockpages to identify ● censorship filter deployments Identification using blockpages is ● consistent and scalable 7
Objectives Data Collection Data Analysis Collect many Identify filters from blockpages from blockpages filter deployments 8
Data Collection Collect the most comprehensive database of filter blockpages 9
Data Collection Censorship measurement techniques frequently observe blockpages 10
Data Collection Censorship measurement techniques frequently observe blockpages TCP Handshake GET https:// blocked .com Inject V o l u n t e e r Server Volunteer measurement https://ooni.org/ Challenges Limited scale and ethical constraints ● 11
Data Collection Censorship measurement techniques frequently observe blockpages TCP Handshake GET https:// blocked .com (Port 7) https://ooni.org GET https:// blocked .com Measurement Inject Inject Echo Machine Quack Server Challenges Remote measurement VanderSloot et al. [USENIX 2018] Cannot detect filters on common Port 80/443 ● 12
Data Collection Censorship measurement techniques frequently observe blockpages Quack Novel remote measurement technique ● https://ooni.org Remote measurement Web servers running on ports 80 and 443 ● Idea: Responses from web server when ● Hyperquack requesting a domain not hosted on the server is predictable New remote measurement 13
Hyperquack 46.43.36.222 14
Hyperquack 46.43.36.222 15
Hyperquack Measurement Machine 46.43.36.222 16
Hyperquack GET https://www.ndss-symposium.org Measurement Machine 46.43.36.222 17
Hyperquack GET https://www.ndss-symposium.org Measurement Machine 46.43.36.222 18
Hyperquack GET https://www.usenix.org Measurement Machine 46.43.36.222 19
Hyperquack GET https://www.usenix.org Measurement Machine 46.43.36.222 20
Hyperquack GET https://www.sigsac.org Measurement Machine 46.43.36.222 21
Hyperquack GET https://www.sigsac.org Measurement Machine 46.43.36.222 22
Hyperquack GET https://www.sigsac.org Measurement Machine 46.43.36.222 23
Canonical Templates Request several bogus but ● benign domain patterns (<www>.example1298.<com>) From the response, remove ● commonly changing elements e.g. date, domain If response for all tests match, ● save as canonical template 24
Censorship Detection Send HTTP(S) GET requests ● Measurement Machine W e b S e r v e r TCP Handshake for sensitive keywords GET https:// example{1,2,3} .com If response different from ● template of server Build Canonical canonical template, then HTTPS reply (e.g., Status Code: 301 Moved) there is censorship response Control tests both before ● GET https:// blocked .com and after to ensure x4 different from consistency Response Inject Canonical Template : GET https:// example{1,2,3} .com Censorship HTTPS reply (e.g., Status Code: 301 Moved) 25
Censorship Detection Send HTTP(S) GET requests ● Measurement Machine W e b S e r v e r TCP Handshake for sensitive keywords GET https:// example{1,2,3} .com If response different from ● template of server Build Canonical canonical template, then HTTPS reply (e.g., Status Code: 301 Moved) there is censorship response Control tests both before ● GET https:// blocked .com and after to ensure x4 different from consistency Response Inject Canonical Template : GET https:// example{1,2,3} .com Censorship HTTPS reply (e.g., Status Code: 301 Moved) 26
Censorship Detection Send HTTP(S) GET requests ● Measurement Machine W e b S e r v e r TCP Handshake for sensitive keywords GET https:// example{1,2,3} .com If response different from ● template of server Build Canonical canonical template, then HTTPS reply (e.g., Status Code: 301 Moved) there is censorship response Control tests both before ● GET https:// blocked .com and after to ensure x4 different from consistency Response Inject Canonical Template : GET https:// example{1,2,3} .com Censorship HTTPS reply (e.g., Status Code: 301 Moved) 27
Censorship Detection Send HTTP(S) GET requests ● Measurement Machine W e b S e r v e r TCP Handshake for sensitive keywords GET https:// example{1,2,3} .com If response different from ● template of server Build Canonical canonical template, then HTTPS reply (e.g., Status Code: 301 Moved) there is censorship response Control tests both before ● GET https:// blocked .com and after to ensure x4 different from consistency Response Inject Canonical Template : GET https:// example{1,2,3} .com Censorship HTTPS reply (e.g., Status Code: 301 Moved) 28
53 million public HTTP hosts Source - censys.io 29
Vantage Point Selection We use infrastructural servers to reduce risk ● PeeringDB - list of offjcial websites of Internet service ● providers Use servers hosting the website for measurement ~10,000 ● 30
Vantage Point Selection We use infrastructural servers to reduce risk ● PeeringDB - list of offjcial websites of Internet service ● providers Use servers hosting the website for measurement ~10,000 ● https://corporate.comcast.com/ 31
Vantage Point Selection We use infrastructural servers to reduce risk ● PeeringDB - list of offjcial websites of Internet service ● providers Use servers hosting the website for measurement ~10,000 ● https://corporate.comcast.com/ 23.219.228.121 32
Ethics Followed all the ethical recommendations made in Quack ● Made it clear that we are running measurements on our ● website Rate limit and close connections ● Make only one measurement at a time to a server ● OONI obtains informed consent ● 33
Measurements Latitudinal Measurements: Longitudinal Measurements: ● ● 3 weeks in October 2018 HyperQuack and Quack ○ ○ twice a week - November HyperQuack - 9,223 VPs ○ 2018 to January 2019 Quack - 33,602 VPs ○ Citizen Lab Global List ○ 18,736 domains - Citizen ○ (~1200 domains) + Alexa Lab Test List Top 1000 domains Added OONI data ○ 34
Data Analysis Automate the identification of filters from more than a million disrupted responses 35
Iterative Classification Insight: Filters often send the same blockpage regardless ● of the test domain Recursively finds large groups of HTML pages with the ● same content Blockpage clusters are labeled with signatures, a unique ● subset of the HTML page or header Example: <th>Barracuda NextGen Firewall:</th> ● 36
Image Clustering Cluster pages with dynamic content - DBSCAN algorithm ● T remendously reduce the manual effort - 1 page in 200 groups ● 37
FilterMap FilterMap enables continuous, sustainable, data-driven view of filter deployment 38
Results FilterMap creates a map of filter deployments based on the vantage points measured 39
FilterMap Results FilterMap found 90 blockpage clusters (Clusters indicate ● either vendors or actors) Filters are deployed in many locations in 103 countries ● Filter types found - Commercial products, national ● firewalls, ISP and organizational deployments 40
Commercial Filters 41
Commercial Filters 15 commercial filters used in 102 countries ● Sold by companies in the US ● Filters found in 36 out of 48 countries labelled as “Not Free” ● or “Partly Free” by Freedom House Pornography , gambling , provocative attire and ● anonymization tools most commonly blocked 42
FilterMap Results 4 National Firewalls - Iran, Saudi Arabia, Bahrain and South Korea ● 43
FilterMap Results 4 National Firewalls - Iran, Saudi Arabia, Bahrain and South Korea ● Large number of filters in ISPs, especially in Russia ● 44
Recommend
More recommend