measuring web similarity from dual stacked hosts
play

Measuring Web Similarity from Dual-Stacked Hosts Takeway Causality - PowerPoint PPT Presentation

Introduction Motivation Leone Project: leone-project.eu Supported by: Oct 2016 SamKnows Limited, London Sam Crawford Jacobs University, Bremen Jrgen Schnwlder SamKnows Limited, London Steffje Jacob Eravuchira Joint work with


  1. Introduction Motivation Leone Project: leone-project.eu Supported by: Oct 2016 SamKnows Limited, London Sam Crawford Jacobs University, Bremen Jürgen Schönwälder SamKnows Limited, London Steffje Jacob Eravuchira Joint work with Montréal, Canada CNSM 2016 Jacobs University, Bremen Vaibhav Bajpai Measuring Web Similarity from Dual-Stacked Hosts Takeway Causality Analysis Success Rates Results Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Research Contributions Research Question 1 / 22 Flamingo Project: fmamingo-project.eu

  2. Introduction APNIC 1 Comcast, Deutsche Telekom AG, AT&T, Verizon Wireless, T-Mobile USA 25.93% Germany 26.73% Switzerland 28.89% United States 45.39% Belgium ARIN LACNIC Motivation RIPE 2 / 22 Results Research Question Research Contributions Methodology Metrics and Implementation Selection of Websites Measurement Setup Measurement Trial Success Rates Takeway Causality Analysis Introduction | Motivation ▶ 4/5 RIRs have exhausted available pool of IPv4 address space [1]. Apr ′ 11 Sep ′ 12 Jun ′ 14 Sep ′ 15 ▶ Large IPv6 broadband rollouts 1 since World IPv6 Launch Day in 2012 [2]. ▶ Increased global adoption of IPv6 to ∼ 14.9% (native) [3] (Oct 2016).

  3. Introduction Takeway We want to know: similarity over IPv4 / IPv6. Motivation over IPv4 and IPv6. of dual-stacked websites has compared performance Recent work [4], [5], [6] Introduction | Research Questions No study comparing web Causality Analysis Methodology Research Question Success Rates Research Contributions 3 / 22 Metrics and Implementation Measurement Trial Measurement Setup Results Selection of Websites ALEXA 1M Websites 120K 100K 80K W6D W6LD 60K 40K 20K AAAA 0 2010 2011 2012 2013 2014 2015 2016 http://www.employees.org/ ∼ dwing/aaaa-stats ▶ How similar are webpages accessed over IPv6 to their IPv4 counterparts? ▶ What factors contribute to the dissimilarity over IPv4 and IPv6?

  4. Introduction Motivation To the best of our knowledge, this is the fjrst study to: 4. Both same-origin and cross-origin sources contribute to the failure rates over IPv6. 3. Failure rates over IPv6 are largely due to DNS resolution error on images, js and CSS. 2. Websites (27%) have some fraction of webpage elements failing over IPv6. We measure against ALEXA top 100 dual-stacked websites. Introduction | Research Contributions Takeway Causality Analysis Success Rates Results Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Research Contributions Research Question 4 / 22 1. simweb : A tool for measuring web similarity over IPv4 and IPv6. ▶ Measure webpage similarity over IPv4 and IPv6. ▶ Investigate IPv6 adoption that goes beyond the root page of a dual-stacked website.

  5. Introduction Motivation Research Question Research Contributions Methodology Metrics and Implementation Selection of Websites Measurement Setup Measurement Trial Results Success Rates Causality Analysis Takeway Methodology 5 / 22

  6. Introduction Results Motivation as a aggregated report for a website. Takeway Causality Analysis Success Rates 6 / 22 Measurement Trial Research Contributions Selection of Websites Metrics and Implementation Methodology Measurement Setup Research Question Methodology | SamKnows webget % webget 1 www.google.com version: WEBGETMT.2 endtime: 1427820219 status: OK target: www.google.com SamKnows [7] probes run webget 2 : address: 2a00:1450:4008:801::1013 fetch_time: 145270 ▶ DNS lookup time. bytes_total: 194818 bytes_sec: 1848376 ▶ Time to fjrst byte. objects: 3 ▶ HTTP request time. threads: 1 requests: 3 ▶ Content size. connections: 1 ▶ Download speed reused_connections: 2 lookups: 1 request_total_time: 128883 request_min_time: 12930 request_avg_time: 42961 request_max_time: 100458 ... 2 fjles.samknows.com/ ∼ gpl

  7. Introduction Results for each webpage element of a website. Motivation Takeway Causality Analysis Success Rates 7 / 22 Measurement Trial Research Contributions Selection of Websites Metrics and Implementation Methodology Measurement Setup Research Question Methodology | JUB simweb ▶ We extend the SamKnows webget test to measure webpage similiarity: simweb in addition also reports: % SIMWEB_L=1 IPVERSION=6 webget 1 www.google.com #: 1 version: SIMWEB.0 ▶ Content Type service: www.google.com timestamp: 1427822156 ▶ Content Size af: 6 status: OK ▶ Resource URL curl_response_code: CURLE_OK object_type: text/html:charset=ISO-8859-1 ▶ IP endpoint http_code: 200 resource_url: www.google.com ▶ CURL response code ip_endpoint: 2a00:1450:4008:801::1010; size_bytes: 52674 ▶ HTTP status code #: 2 ...

  8. Introduction Causality Analysis Tie number of same-origin & cross-origin sources. 2. Service Complexity Tie number & size of fetched webpage elements. 1. Content Complexity We use 2 well-known webpage complexity metrics from literature [8, 9]: Methodology | Metrics Takeway Success Rates Motivation Results Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Research Contributions Research Question 8 / 22

  9. Introduction Results as measurement targets [4]. Methodology | Selection of Websites Motivation Causality Analysis Success Rates Takeway Measurement Trial Research Contributions Measurement Setup Research Question 9 / 22 Methodology Metrics and Implementation Selection of Websites 1. www.google.com 2. www.facebook.com 3. www.youtube.com 4. www.yahoo.com ▶ We use the ALEXA top 100 dual-stacked websites 5. www.wikipedia.org 6. www.qq.com 7. www.blogspot.com 8. …

  10. Introduction Results Motivation Methodology | Measurement Setup Takeway Causality Analysis Success Rates 10 / 22 Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Research Contributions Research Question ALEXA Dual-Stacked Tie simweb test: Top 100 ▶ runs twice (once for each AF). HTTP GET IPv6 simweb IPv4 results ▶ repeats every hour. DSL/Cable SamKnows Modem Tests ▶ uses user-agent string: Mozilla/4.0 Probe HTTPS POST Data Collector

  11. Introduction Results We measure from 80 dual-stacked SamKnows probes. Motivation Methodology | Measurement Trial Takeway Causality Analysis Success Rates 11 / 22 Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Research Contributions Research Question NETWORK TYPE # RESIDENTIAL 55 NREN / RESEARCH 11 BUSINESS / DATACENTER 09 OPERATOR LAB 04 IXP 01 RIR # RIPE 42 ARIN 29 APNIC 07 AFRINIC 01 LACNIC 01

  12. Introduction Motivation Research Question Research Contributions Methodology Metrics and Implementation Selection of Websites Measurement Setup Measurement Trial Results Success Rates Causality Analysis Takeway Data Analysis 3 3 Measurements conducted for 65 days between April 2015 and June 2015. 12 / 22

  13. Introduction Takeway Motivation IPv4 W6LD Success Rate (%) Webpage # Can we fetch all webpage elements over IPv6? Results | Success Rates 13 / 22 Causality Analysis Measurement Setup Results Measurement Trial Research Contributions Methodology Metrics and Implementation Success Rates Research Question Selection of Websites IPv6 ( ↓ ) 01 www.bing.com 0 100 ✓ 02 www.detik.com 0 100 ✓ 03 www.engadget.com 0 100 ✓ 04 www.nifty.com 0 100 05 www.qq.com 0 100 ▶ 27% of websites show some rate of failure over IPv6. 06 www.sakura.ne.jp 0 100 07 www.flipkart.com 09 99 ✓ ▶ 9% exhibit more than 50% failures over IPv6. 08 www.folha.uol.com.br 13 100 09 www.aol.com 48 100 ✓ ▶ 6% show complete failure (0% success) over IPv6. 10 www.comcast.net 52 100 ✓ 11 www.yahoo.com 72 100 ✓ 12 www.mozilla.org 84 100 ✓ ALEXA top 100 Websites 13 www.orange.fr 86 100 ✓ 14 www.seznam.cz 89 100 1.0 ✓ IPv6 15 www.mobile.de 90 100 ✓ 16 www.wikimedia.org 90 100 0.8 IPv4 17 www.t-online.de 93 100 ✓ 0.6 18 www.free.fr 95 100 CDF 19 www.usps.com 95 100 0.4 20 www.vk.com 95 100 ✓ 21 www.wikipedia.org 95 100 ✓ 0.2 22 www.wiktionary.org 95 100 23 www.elmundo.es 96 100 ✓ 0.0 24 www.uol.com.br 96 100 ✓ 0 20 40 60 80 100 25 www.marca.com 97 100 ✓ Success Rate (%) 26 www.terra.com.br 98 100 ✓ 27 www.youm7.com 99 100

  14. Introduction Takeway account for changes in IPv6-readiness. Motivation IPv4 W6LD Success Rate (%) Webpage # ALEXA top 100 dual-stacked websites: Results | Success Rates 14 / 22 Causality Analysis Research Contributions Results Measurement Trial Measurement Setup Selection of Websites Metrics and Implementation Methodology Success Rates Research Question www.bing.com 10 3 10 2 10 1 10 0 www.detik.com 10 3 ▶ 6% show complete failure over IPv6. 10 2 www.engadget.com TCP Connect Times (ms) 10 3 10 2 10 1 IPv6 ( ↓ ) 10 0 www.nifty.com 01 www.bing.com 0 100 ✓ 10 3 02 www.detik.com 0 100 ✓ 03 www.engadget.com 0 100 ✓ 04 www.nifty.com 0 100 10 2 05 www.qq.com 0 100 www.qq.com 10 4 06 www.sakura.ne.jp 0 100 10 3 10 2 10 1 10 0 www.sakura.ne.jp 10 3 ▶ Metrics that measure IPv6 adoption should 10 2 Jul Jul Jul Jan Jan Jan Jan 2013 2014 2015 2016 IPv6 IPv4

Recommend


More recommend