website fingerprinting at internet scale
play

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian - PowerPoint PPT Presentation

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 , Martin Henze 3 , Jan Pennekamp 1 , Klaus Wehrle 3 , Thomas Engel 1 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2


  1. Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 , Martin Henze 3 , Jan Pennekamp 1 , Klaus Wehrle 3 , Thomas Engel 1 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2 RheinMain University of Applied Sciences, Germany 3 RWTH Aachen University, Germany

  2. Background Why people use Tor... Privacy has become a general concern Access to the Internet is censored in many countries

  3. Website Fingerprinting OR OR OR OR OR Client Server OR OR Tor : The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information

  4. Website Fingerprinting OR OR Entry OR Middle OR Exit OR Client Server OR OR Tor : The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information

  5. Website Fingerprinting OR OR Entry OR Middle OR Exit OR Client Server OR OR Tor : The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information

  6. Website Fingerprinting OR OR ? Entry OR Middle OR Exit OR Client Server OR OR What is website fingerprinting? Identify website accessed without breaking cryptography Attacker is a passive observer Features based on packet size, direction, ordering, timing

  7. Website Fingerprinting - state of the art Widely discussed and hot topic in anonymity research State-of-the-art approach: Wang et al . ( Usenix Sec’14 ) k - N earest N eighbor approach manually selected features (e.g., bursts, unique lengths) about 4,000 features recognition rates > 90% 2 scenarios for evaluation Closed world : user visits only a fixed number of websites Open world : monitor set of sites (user may visit unknown sites)

  8. Our method Idea Don’t try to guess which characteristics may be relevant Use a representation that implicitly covers all characteristics Our feature set: ( N in ,N out ,S in ,S out , C 1 , · · · , C n ) � �� � � �� � basic properties cumulative features 7000 C ( T 1 ) Cumulative Sum of Packet Sizes 6000 C i sampled for T 1 5000 C ( T 2 ) C i sampled for T 2 4000 3000 2000 1000 0 − 1000 0 2 4 6 8 10 12 14 16 18 Packet Number

  9. Example 200 about.com google.de Feature Value [kByte] 150 100 50 0 20 40 60 80 100 Feature Index Fixed number of distinctive characteristics from traces with varying lengths Fingerprints can be visualized Used as input for a Support Vector Machine

  10. Layers of data representation Tor cells Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 TLS records Record 1 * Record 2 TCP packets Packet 1 Packet 2 Packet 3 Information src for feature extraction: Cell vs. TLS vs. TCP Practically nigligible effect on the classification accuracy

  11. Comparison with state of the art – classification Closed world Accuracy [%] for 100 most popular websites 90 instances 40 instances k-NN (3736 features) 90.84 89.19 Our method (104 features) 91.38 92.03 Open world Foreground : 100 blocked websites, background : 9,000 popular websites TPR FPR k-NN 90.59 2.24 Our method 96.92 1.98

  12. Comparison of computational performance 10 3 10 2 Average Processing Time [h] 10 1 10 0 10 − 1 10 − 2 k-NN CUMUL 10 − 3 CUMUL (parallelized) 10 − 4 0 10000 20000 30000 40000 50000 Background Set Size Computation time for 100 random monitored pages in open world

  13. Website fingerprinting in reality Critique Data sets used are not representative! too small, only popular websites / index pages Simplified assumptions, wrong metrics for evaluation RND-WWW: How do people access the world wide web?  Twitter     Alexa-one-click     > 120,000 web pages Googling the trends   Googling at random      Censored in China  Tor-Exit: Which pages do users actually access over Tor? Monitor a Tor Exit node ⇒ 211,148 web pages

  14. Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Which metric to evaluate? Accuracy : fraction of true results True Positive rate / Recall : fraction of monitored pages detected False Positive Rate : fraction of false alarms Problem : misleading interpretation ⇒ base rate fallacy Precision : probability that the classifier is correct given it has detected a monitored page Focus of evaluation Precision and recall for increasing background set sizes Random subset as foreground

  15. Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for RND-WWW 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 40 40 b = 5000 b = 5000 b = 9000 b = 9000 b = 20000 b = 20000 20 20 b = 50000 b = 50000 b = 111884 b = 111884 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision

  16. Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for Tor-Exit 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 b = 5000 b = 5000 40 40 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884 b = 211148 b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision

  17. Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for Tor-Exit 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 b = 5000 b = 5000 40 40 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884 b = 211148 b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision Answer : No.

  18. Webpage fingerprinting at Internet scale Question : Is it at least possible for certain pages?

  19. Webpage fingerprinting at Internet scale Question : Is it at least possible for certain pages? Minimum number of mistakenly confused pages 100 b =20 000 Fraction of Foreground Pages [%] b =50 000 b =100 000 80 60 40 20 0 0 50 100 150 200 250 300 350 400 Number of Webpage Confusions No single page without a confusingly similar page in a realistic universe.

  20. How about fingerprinting web sites ? (1/2) A website is a collection of web pages served under the same domain Is it possible to fingerprint a website when only a subset of its pages are available for training? Experiment: 20 websites 1 . 0 ALJAZEERA 47 1 2 1 ALJAZEERA 51 AMAZON 28 5 1 1 4 3 1 1 3 3 1 AMAZON 51 0 . 9 BBC 43 1 1 4 2 BBC 50 1 CNN 2 45 1 3 CNN 51 0 . 8 EBAY 2 1 32 3 1 2 2 1 2 2 2 1 EBAY 51 FACEBOOK 41 2 1 1 1 2 3 FACEBOOK 50 1 0 . 7 IMDB 49 2 IMDB 51 KICKASS 1 49 1 KICKASS 51 0 . 6 LOVESHACK 1 45 2 2 1 LOVESHACK 49 1 1 RAKUTEN 1 2 2 44 1 1 RAKUTEN 51 0 . 5 REDDIT 3 48 REDDIT 51 RT 4 1 44 1 1 RT 51 0 . 4 SPIEGEL 1 1 48 1 SPIEGEL 1 2 1 47 STACKOVERFLOW 1 3 2 1 2 3 31 1 1 2 2 2 STACKOVERFLOW 51 0 . 3 TMZ 1 50 TMZ 1 2 1 46 1 TORPROJECT 51 TORPROJECT 1 1 3 7 31 1 7 0 . 2 TWITTER 4 2 1 1 1 5 1 1 1 1 33 TWITTER 50 1 WIKIPEDIA 51 WIKIPEDIA 1 3 1 1 5 3 37 0 . 1 XHAMSTER 1 50 XHAMSTER 3 1 47 XNXX 51 XNXX 1 50 0 . 0 ALJAZEERA AMAZON BBC CNN EBAY FACEBOOK IMDB KICKASS LOVESHACK RAKUTEN REDDIT RT SPIEGEL STACKOVERFLOW TMZ TORPROJECT TWITTER WIKIPEDIA XHAMSTER XNXX ALJAZEERA AMAZON BBC CNN EBAY FACEBOOK IMDB KICKASS LOVESHACK RAKUTEN REDDIT RT SPIEGEL STACKOVERFLOW TMZ TORPROJECT TWITTER WIKIPEDIA XHAMSTER XNXX (a) only index pages (b) different pages

  21. How about fingerprinting web sites ? (2/2) Transition of results from closed-world to the realistic open-world setting is typically not trivial Website fingerprinting scales better than webpage fingerprinting 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Precision Precision Recall Recall 0.0 0.0 0 20000 40000 60000 80000 100000 120000 0 20000 40000 60000 80000 100000 120000 Background Set Size Background Set Size

  22. Summary Our classifier with 104 features outperforms state of the art Alarming results under simplified assumptions can’t be generalized Webpage fingerprinting does not scale for appropriate universe sizes for any webpage Website fingerprinting is not only more realistic and also significantly more effective Conclusions drawn need to be reconsidered Scripts and RND-WWW dataset: http://lorre.uni.lu/~andriy/zwiebelfreunde/

  23. We are hiring! Our lab within the Interdisciplinary Centre for Security, Reliability and Trust (Uni Luxembourg) is looking for PhD candidates and PostDocs in the area of anonymity and privacy More information: http://secan-lab.uni.lu/jobs

Recommend


More recommend