CompSci 514: Computer Networks L18: Datacenter Network Architectures II Xiaowei Yang 1
Outline • Design and evaluation of VL2 • Discussion – FatTree vs VL2 • What common challenges did each address? • What methods did each use to address those challenges? 2
Virtual Layer 2: A Scalable and Flexible Data-Center Network Microsoft Research Changhoon Kim Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta
Tenets of Cloud-Service Data Center • Agility: Assign any servers to any services – Boosts cloud utilization • Scaling out: Use large pools of commodities – Achieves reliability, performance, low cost Statistical Economies Multiplexing Gain of Scale 4
What is VL2? The first DC network that enables agility in a scaled-out fashion • Why is agility important? – Today’s DC network inhibits the deployment of other technical advances toward agility • With VL2, cloud DCs can enjoy agility in full 5
Status Quo: Conventional DC Network Internet CR CR DC-Layer 3 . . . AR AR AR AR DC-Layer 2 Key S S • CR = Core Router (L3) • AR = Access Router (L3) . . . S S S S • S = Ethernet Switch (L2) • A = Rack of app. servers … … A A A A A A ~ 1,000 servers/pod == IP subnet Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004 6
Conventional DC Network Problems CR CR ~ 200:1 AR AR AR AR S S S S ~ 40:1 . . . S S S S S S S S ~ 5:1 A A A A A A A A A … … A A A … … • Dependence on high-cost proprietary routers • Extremely limited server-to-server capacity 7
And More Problems … CR CR ~ 200:1 AR AR AR AR S S S S S S S S S S S S A A A A A A A A A A … … A A A … … IP subnet (VLAN) #1 IP subnet (VLAN) #2 • Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency) 8
And More Problems … CR CR ~ 200:1 AR AR AR AR Complicated manual L2/L3 re-configuration S S S S S S S S S S S S A A A A A A A A A A … … A A A … … IP subnet (VLAN) #1 IP subnet (VLAN) #2 • Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency) 9
And More Problems … CR CR AR AR AR AR S S S S S S S S S S S S A A A A A A A A A … … A A A … … Revenue lost Expense wasted • Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency) 10
Designing VL2 • Measuring to know the characteristics of datacenter networks • Design routing schemes that work well with the traffic patterns • Q: limitations of this design approach? 11
Measuring Traffic • Instrumented a large cluster used for data mining and identified distinctive traffic patterns – a highly utilized 1500 node cluster in a data center that supports data mining on petabytes of data. – The servers are distributed roughly evenly across 75 ToR switches – Collected socket-level event logs from all machines over two months. 12
Traffic analysis 1. The ratio of traffic volume between servers in our data centers to traffic entering/leaving our data centers is currently around 4:1 (excluding CDN applications). 2. Datacenter computation is focused where high speed access to data on memory or disk is fast and cheap. Although data is distributed across multiple data centers, intense computation and communication on data does not straddle data centers due to the cost of long-haul links. 3. The demand for bandwidth between servers inside a data center is growing faster than the demand for bandwidth to external hosts 4. The network is a bottleneck to computation. We frequently see ToR switches whose uplinks are above 80% utilization. 13
Flow Distribution Analysis 0.45 Flow Size PDF 0.4 0.35 Total Bytes PDF 0.3 PDF 0.25 0.2 0.15 0.1 0.05 0 1 100 10000 1e+06 1e+08 1e+10 1e+12 Flow Size (Bytes) 1 Flow Size CDF 0.8 CDF Total Bytes CDF 0.6 0.4 0.2 0 1 100 10000 1e+06 1e+08 1e+10 1e+12 Flow Size (Bytes) Figure : Mice are numerous; of f ows are smaller than MB. However, more than of bytes are in f ows between MB and GB. 14
Number of Concurrent Flows 0.04 1 Fraction of Time PDF 0.8 CDF Cumulative 0.03 0.6 0.02 0.4 0.01 0.2 0 0 1 10 100 1000 Number of Concurrent flows in/out of each Machine Figure : Number of concurrent connections has two modes: ( ) f ows per node more than of the time and ( ) f ows per node for at least of the time. 15
Implications • The distributions of flow size and number of concurrent flows both imply that VLB will perform well on this traffic. Since even big flows are only 100MB (1 s of transmit time at 1 Gbps), randomiz- ing at flow granularity (rather than packet) will not cause perpetual congestion if there is unlucky placement of a few flows. • Moreover, adaptive routing schemes may be difficult to implement in the data center, since any reactive traffic engineering will need to run at least once a second if it wants to react to individual flows. 16
Traffic Matrix Analysis • Q: Is there regularity in the traffic that might be exploited through careful measurement and traffic engineering? • Method – Compute the ToR-to-ToR TM — the entry TM(t) i,j is the number of bytes sent from servers in ToR i to servers in ToR j during the 100 s beginning at time t. We compute one TM for every 100 s interval, and servers outside the cluster are treated as belonging to a single “ToR” – Cluster similar TMs and choose one representative TM per cluster 17
Results • No representative TMs • On a timeseries of 864 TMs • Approximat ing with 50 − 60 clusters • The fitting error remains high (60%) and only decreases moderately beyond that point 18
Instability of Traffic Patterns 40 Index of the Containing Cluster 35 300 30 200 Frequency Frequency 25 200 20 100 15 100 10 50 5 0 0 0 0 200 400 600 800 1000 0 5 10 20 2.0 3.0 4.0 Run Length log(Time to Repeat) Time in 100s intervals (a) (b) (c) Figure : Lack of short-term predictability: Ti e cluster to which a tra f c matrix belongs, i.e., the type of tra f c mix in the TM, changes quickly and randomly. 19
Failure Characteristics • Most failures are small in size – 50% of network device failures involve < 4 devices – 95% of network device failures involve < 20 devices while large correlated failures are rare (e.g., the largest correlated failure involved 217 switches) – Downtimes can be significant: 95% of failures are resolved in 10 min, 98%in<1hr, 99.6%in<1 day,but 0.09% last>10days. 20
Questions to ponder • What design choices may change if observed different traffic patterns? 21
Know Your Cloud DC: Challenges • Instrumented a large cluster used for data mining and identified distinctive traffic patterns • Traffic patterns are highly volatile – A large number of distinctive patterns even in a day • Traffic patterns are unpredictable – Correlation between patterns very weak Optimization should be done frequently and rapidly 22
Know Your Cloud DC: Opportunities • DC controller knows everything about hosts • Host OS’s are easily customizable • Probabilistic flow distribution would work well enough, because … – Flows are numerous and not huge – no elephants! – Commodity switch-to-switch links are substantially thicker (~ 10x) than the maximum thickness of a flow DC network can be made simple 23
All We Need is Just a Huge L2 Switch, or an Abstraction of One CR CR . . . AR AR AR AR S S S S . . . S S S S S S S S A A A A A A A A A A A A A A A A A A A A A … A A … A A A A A A A A A A A A A A … … 24
All We Need is Just a Huge L2 Switch, or an Abstraction of One 1. L2 semantics 2. Uniform high 3. Performance capacity isolation A A A A A A A A A A A A A A A A A A A A A … A A … A A A A A A A A A A A A A A … … 25
Specific Objectives and Solutions Approach Solution Objective Name-location 1. Layer-2 Employ flat separation & semantics addressing resolution service 2. Uniform Guarantee Flow-based random high capacity bandwidth for traffic indirection between servers hose-model traffic (Valiant LB) Enforce hose model 3. Performance using existing TCP Isolation mechanisms only 26
Recommend
More recommend