Workshop on Active Internet Measurements CAIDA Feb. 8, 2012 Characterizing Global Web Censorship: Why is it so hard? Phillipa Gill The Citizen Lab/Stony Brook University Work done in collaboration with: Masashi Crete Nishihata, Jakub Dalek, Sharon Goldberg, Adam Senft and Greg Wiseman
Overview Large-scale politically driven Internet outages are well known… …but what happens within countries is less well understood • We leverage data gathered by an interdisciplinary group (Open Net Initiative) to bootstrap analysis 77 countries, 286 distinct ISPs, measured from 2007-2012 • Advantages: context about what, when, and where to measure • Disadvantages: dearth of technical data/raw measurements • Our results highlight important challenges for censorship research! 2
Background • Where censorship can happen: 3
Background • Where censorship can happen: Start DNS reply? 4
Background • Where censorship can happen: Start No DNS reply? DNS blocking 5
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking 6
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No Yes Response to SYN? 7
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking 8
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking Yes Response to HTTP request? 9
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking Yes Response to No HTTP request? No HTTP Reply 10
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking Yes Response to No HTTP request? Yes No HTTP Reply What was it? 11
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking Yes Response to No HTTP request? Yes No HTTP Reply What was it? RST Infinite HTTP Block page Redirect 12
Background • Where censorship can happen: Start No Yes DNS reply? DNS redirect? DNS blocking No No Yes IP Response to SYN? blocking Yes Response to No HTTP request? Yes No HTTP Reply What was it? RST Infinite HTTP Block page Redirect 13
Methodology • Basic idea: Issue requests for a consistent set of sites in the field and a control location ( lab ) • Software synchronizes the requests between lab and field • Once both lab and field have completed, results sent back to the lab for more analysis • What is tested: – Sites that are likely to trigger censorship – Determined in collaboration with regional groups • Where are tests run: – Combination of targeted/opportunistic testing – Performed by regional collaborators after informed consent meeting 14
Challenges for censorship research 15
1. Variation between countries 1 Fraction of blocking results 0.8 0.6 0.4 0.2 0 China Iran UAE Yemen Burma Vietnam Country No DNS Reply DNS Redirection No HTTP Reply RST Blockpage 16
1. Variation between countries 1 Fraction of blocking results 0.8 0.6 0.4 0.2 0 China Iran UAE Yemen Burma Vietnam Country No DNS Reply DNS Redirection No HTTP Reply RST Blockpage 17
1. Variation between countries 1 Fraction of blocking results 0.8 0.6 0.4 0.2 0 China Iran UAE Yemen Burma Vietnam Country No DNS Reply DNS Redirection No HTTP Reply RST Blockpage 18
1. Variation between countries 1 Fraction of blocking results 0.8 0.6 0.4 0.2 0 China Iran UAE Yemen Burma Vietnam Country No DNS Reply DNS Redirection No HTTP Reply There is no such thing as a “representative” country RST Blockpage 19
2. Variation between ISPs Decentralized blocking in UAE 0.25 Fraction of content blocked 0.2 0.15 0.1 0.05 0 2007 2008 2009 2010 2011 2012 Year AS 5384 AS 15802 20
2. Variation between ISPs Decentralized blocking in UAE 0.25 Fraction of content blocked 0.2 0.15 0.1 0.05 0 2007 2008 2009 2010 2011 2012 Year AS 5384 AS 15802 “Du” ISP does not censor prior to April 2008 21
2. Variation between ISPs Decentralized blocking in UAE 0.25 Fraction of content blocked 0.2 0.15 0.1 0.05 0 2007 2008 2009 2010 2011 2012 Year Censorship is a per-ISP property AS 5384 AS 15802 (when censorship is decentralized) 22
2. Variation between types of networks 23
2. Variation between types of networks 1 Jaccard similarity coeff. 0.9 Academic networks block 0.8 an average of 40% less! 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Country 24
2. Variation between types of networks 1 Jaccard similarity coeff. 0.9 Academic networks block 0.8 an average of 40% less! 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Academic networks are not representative! Country 25
3. Sudden temporal shifts in blocking Censorship in Burma over time Fraction of tests blocked 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2009 2010 2011 2012 Year Political Social Internet Conflict 26
3. Sudden temporal shifts in blocking Censorship in Burma over time Fraction of tests blocked 0.35 End of military 0.3 rule in 2011 0.25 brought political 0.2 reforms. 0.15 0.1 0.05 0 2009 2010 2011 2012 Year Political Social Internet Conflict 27
3. Sudden temporal shifts in blocking Censorship in Burma over time Fraction of tests blocked 0.35 End of military 0.3 rule in 2011 0.25 brought political 0.2 reforms. 0.15 0.1 0.05 0 2009 2010 2011 2012 Year Need to measure over time and correlate with Political Social Internet Conflict political changes 28
4. Stealthy blocking of certain content Censorship of content in Yemen 1 Fraction of block results 0.8 0.6 0.4 0.2 0 Political Social Internet Conflict Theme No DNS Reply No HTTP Reply RST Blockpage 29
4. Stealthy blocking of certain content Censorship of content in Yemen 1 Fraction of block results 0.8 0.6 0.4 0.2 0 Political Social Internet Conflict Theme No DNS Reply No HTTP Reply RST Blockpage Transparent blocking of social and Internet content 30
4. Stealthy blocking of certain content Censorship of content in Yemen 1 Fraction of block results 0.8 0.6 0.4 0.2 0 Political Social Internet Conflict Theme No DNS Reply No HTTP Reply RST Blockpage “Stealthy” blocking of political Transparent blocking of and conflict related content social and Internet content 31
4. Stealthy blocking of certain content Censorship of content in Yemen 1 Fraction of block results 0.8 0.6 0.4 0.2 0 Political Social Internet Conflict Theme No DNS Reply No HTTP Reply RST Blockpage Measurement needs to be robust to distinguish failure “Stealthy” blocking of political Transparent blocking of from censorship and conflict related content social and Internet content 32
5. The type of content tested matters 0.5 Fraction blocked 0.4 Local 0.3 Global 0.2 0.1 0 Country 33
5. The type of content tested matters 0.5 Fraction blocked 0.4 Local 0.3 Global 0.2 0.1 0 Country 3-5X more blocking of local content in China/Yemen * most blocked content is political 34
5. The type of content tested matters 0.5 Fraction blocked 0.4 Local 0.3 Global 0.2 0.1 0 Country Less discrepancy in UAE * most blocked content is social 35
5. The type of content tested matters 0.5 Fraction blocked 0.4 Local 0.3 Global 0.2 0.1 0 Country Need to take an interdisciplinary approach to determine what content to test 36
Challenges for censorship research: 1. Variations between technology used by countries 2. Variations between ISPs and between ISPs and institutions 3. Sudden temporal shifts in blocking 4. Stealthy blocking of certain content 5. Locally relevant content is more likely to be blocked And more! … maintaining infrastructure across funding cycles/staff turn over … informed consent/preserving user privacy when testing can pose a physical risk! 37
What’s next? More measurements, taking an interdisciplinary approach to tackle the problem: • Rigorous measurements + political context Data sharing? • Short answer: we’re working on it. • Longer answer: this project has laid the foundation in terms of unifying the data and removing PII. – Anticipate releasing data in the next ~4 months 38
What I hope to get out of this workshop • Discuss how existing platforms may be used for censorship research Particularly interested in: – Platforms with visibility into the network edge – DNS/BGP measurements • Discuss how a large scale, long-term censorship measurement platform may be built • Discuss how we might distinguish transient failures/TCP bugs from actual censorship 39
Recommend
More recommend