TCP CONGESTION SIGNATURES Srikanth Sundaresan (Princeton Univ.) Amogh Dhamdhere (CAIDA/UCSD) kc Claffy (CAIDA/UCSD) Mark Allman (ICSI) 1 w w w . cai da. or
Typical Speed Tests Don’t Tell Us Much 2 w w w . cai da. or
Typical Speed Tests Don’t Tell Us Much 2 w w w . cai da. or
Typical Speed Tests Don’t Tell Us Much 2 w w w . cai da. or
Typical Speed Tests Don’t Tell Us Much • Upload and download throughput measurements: no information beyond that 2 w w w . cai da. or
Typical Speed Tests Don’t Tell Us Much What type of congestion did the TCP flow experience? 2 w w w . cai da. or
Two Potential Sources of Congestion in the End-to-end Path 3 w w w . cai da. or
Two Potential Sources of Congestion in the End-to-end Path • Self-induced congestion - Clear path, the flow itself induced congestion - eg: last-mile access link 3 w w w . cai da. or
Two Potential Sources of Congestion in the End-to-end Path • Self-induced congestion - Clear path, the flow itself induced congestion - eg: last-mile access link • External congestion - Flow starts on an already congested path - eg: congested interconnect 3 w w w . cai da. or
Two Potential Sources of Congestion in the End-to-end Path • Self-induced congestion - Clear path, the flow itself induced congestion - eg: last-mile access link • External congestion - Flow starts on an already congested path - eg: congested interconnect Distinguishing the two cases has implications for users / ISPs / regulators 3 w w w . cai da. or
How can we distinguish the two? • Cannot distinguish using just throughput numbers - Access plan rates vary widely, and are typically not available to content / speed test providers - eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a congested path? 4 w w w . cai da. or
How can we distinguish the two? • Cannot distinguish using just throughput numbers - Access plan rates vary widely, and are typically not available to content / speed test providers - eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a congested path? We can use the dynamics of TCP’s startup phase, i.e., Congestion Signatures 4 w w w . cai da. or
TCP’s RTT Congestion Signatures 5 w w w . cai da. or
TCP’s RTT Congestion Signatures • Flows experiencing self-induced congestion fill up an empty buffer during slow start - Hence increase the TCP flow RTT 5 w w w . cai da. or
TCP’s RTT Congestion Signatures • Flows experiencing self-induced congestion fill up an empty buffer during slow start - Hence increase the TCP flow RTT • Externally congested flows encounter an already full buffer - Less potential for RTT increases 5 w w w . cai da. or
TCP’s RTT Congestion Signatures • Flows experiencing self-induced congestion fill up an empty buffer during slow start - Hence increase the TCP flow RTT • Externally congested flows encounter an already full buffer - Less potential for RTT increases • Self-induced congestion therefore has higher RTT variance compared to external congestion 5 w w w . cai da. or
TCP’s RTT Congestion Signatures • Flows experiencing self-induced congestion fill up an empty buffer during slow start - Hence increase the TCP flow RTT • Externally congested flows encounter an already full buffer - Less potential for RTT increases • Self-induced congestion therefore has higher RTT variance compared to external congestion We can quantify this using Max-Min and CoV of RTT 5 w w w . cai da. or
Example Controlled Experiment 1 . 0 • 20 Mbps “access” link External 0 . 8 Self with 100 ms buffer 0 . 6 CDF • 1 Gbps “interconnect” 0 . 4 link with 50 ms buffer 0 . 2 Max-Min RTT 0 . 0 10 1 10 2 1 . 0 • Self-induced External 0 . 8 Self congestion flows have 0 . 6 higher values for both CDF metrics and are clearly 0 . 4 distinguishable 0 . 2 CoV RTT 0 . 0 10 − 2 10 − 1 10 0 6 w w w . cai da. or
Example Controlled Experiment 1 . 0 • 20 Mbps “access” link External 0 . 8 Self with 100 ms buffer 0 . 6 CDF • 1 Gbps “interconnect” 0 . 4 link with 50 ms buffer 0 . 2 Max-Min RTT 0 . 0 10 1 10 2 1 . 0 • Self-induced External 0 . 8 Self congestion flows have 0 . 6 higher values for both CDF metrics and are clearly 0 . 4 distinguishable 0 . 2 CoV RTT 0 . 0 10 − 2 10 − 1 10 0 The two types of congestion exhibit widely contrasting behaviors 6 w w w . cai da. or
Model • Max-min and CoV of RTT derived from RTT samples during slow start • We feed the two metrics into a simple Decision Tree - We control the depth of the tree to a low value to minimize complexity • We build the decision tree classifier using controlled experiments and apply it to real-world data 7 w w w . cai da. or
Validating the Method: Step 1- Controlled Experiments Server 2 Server 1 Internet Pi 1 100 Mbps 1 Gbps Server 3 Shaped “access” R2 R1 Pi 2 Server 4 8 w w w . cai da. or
Validating the Method: Step 1- Controlled Experiments Server 2 Server 1 Background cross-traffic Internet Pi 1 100 Mbps 1 Gbps Server 3 Shaped “access” R2 R1 Interconnect Pi 2 cross-traffic Server 4 8 w w w . cai da. or
Validating the Method: Step 1- Controlled Experiments Server 2 Server 1 Background cross-traffic Internet Pi 1 100 Mbps 1 Gbps Server 3 Shaped “access” R2 R1 Interconnect Pi 2 cross-traffic Throughput Server 4 tests 8 w w w . cai da. or
It’s Real 9 w w w . cai da. or
It’s Real Fantastic Post-it Cabling defined effort networking 9 w w w . cai da. or
Validating the Method: Step 1- Controlled Experiments Server 2 Server 1 Background cross-traffic Internet Pi 1 100 Mbps 1 Gbps Server 3 Shaped “access” R2 R1 Interconnect Pi 2 cross-traffic Throughput Server 4 tests • Emulated access link + “core” link - Wide range of access link throughputs, buffer sizes, loss rates, cross- traffic (background and congestion-inducing) - Can accurately label flows in training data as “self” or “externally” congested 10 w w w . cai da. or
Validating the Method: Step 1- Controlled Experiments Server 2 Server 1 Background cross-traffic Internet Pi 1 100 Mbps 1 Gbps Server 3 Shaped “access” R2 R1 Interconnect Pi 2 cross-traffic Throughput Server 4 tests High accuracy: precision and recall > 80% robust to model settings 11 w w w . cai da. or
Validating the Method: Step 2 ISP B ISP A Ark VP • From Ark VP in ISP A identified congested link with ISP B using TSLP* *Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014 12 w w w . cai da. or
Validating the Method: Step 2 ISP B congested link ISP A Ark VP • From Ark VP in ISP A identified congested link with ISP B using TSLP* *Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014 12 w w w . cai da. or
Validating the Method: Step 2 M-lab NDT server ISP B congested link ISP A Ark VP • Periodic NDT tests from Ark VP to M-Lab NDT server “behind” the congested interdomain link 13 w w w . cai da. or
Validation of the Method: Step 2 30 25 d/l Mbps 20 15 10 5 0 02/18 02/25 03/04 03/11 TSLP latency (far side) 70 60 50 40 30 20 10 02/18 02/25 03/04 03/11 Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested 14 w w w . cai da. or
Validation of the Method: Step 2 30 25 “Externally” d/l Mbps 20 congested 15 10 5 0 02/18 02/25 03/04 03/11 TSLP latency (far side) 70 60 50 40 30 20 10 02/18 02/25 03/04 03/11 Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested 14 w w w . cai da. or
Validation of the Method: Step 2 30 25 “Externally” d/l Mbps 20 congested 15 “self” 10 5 congested 0 02/18 02/25 03/04 03/11 TSLP latency (far side) 70 60 50 40 30 20 10 02/18 02/25 03/04 03/11 Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested 14 w w w . cai da. or
Validation of the Method: Step 2 30 25 d/l Mbps 20 15 10 5 0 02/18 02/25 03/04 03/11 TSLP latency (far side) 70 60 50 40 30 20 10 02/18 02/25 03/04 03/11 75%+ accuracy in detecting external congestion, 100% accuracy for self-induced congestion 15 w w w . cai da. or
Validation of the Method: Step 3 • We use Measurement Lab’s NDT test data for real-world validation • Cogent interconnect issue in late 2013/early 2014 - NDT tests to Cogent servers saw significant drops in throughput during peak hours - Several major U.S. ISPs were affected, except Cox - The problem was identified as congested interconnects 16 w w w . cai da. or
Using the M-lab Data 40 Comcast TimeWarner January 2014 Cox Verizon 30 Mbps 20 10 0 5 10 15 20 Hour of day (local) 40 April 2014 Comcast TimeWarner Cox Verizon 30 Mbps 20 10 0 5 10 15 20 Hour of day (local) 17 w w w . cai da. or
Recommend
More recommend