Understanding Understanding Lifecycle Management Lifecycle Management Complexity of Datacenter Complexity of Datacenter Topologies Topologies Mingyang Zhang (USC) Mingyang Zhang (USC) Radhika Niranjan Mysore (VMware Research) Sucha Supittayapornpong (USC) Radhika Niranjan Mysore (VMware Research) Ramesh Govindan (USC) Sucha Supittayapornpong (USC) Ramesh Govindan (USC) 1
Datacenter topology designs 5-layer Clos Jellyfish [NSDI12] Xpander [CoNEXT16] 2
Previous focus Clos Capacity Jellyfish Xpander Cost ($) 3
Manageability has received very little attention! Clos Capacity Jellyfish Xpander Cost ($) 4
Manageability has received very little attention! Clos Management complexity Capacity Jellyfish Xpander Cost ($) 5
How does the complexity of managing data centers depend on the topology? Our Focus: Lifecycle management 6
Lifecycle management of datacenter topologies Deployment Logical topology Physical topology 7
Lifecycle management of datacenter topologies Deployment Logical topology Physical topology Expansion New added switches 8
Management complexity is important ⎯ Complex deployment stalls the rollout of services for a long time 9
Management complexity is important ⎯ Complex deployment stalls the rollout of services for a long time ⎯ Expensive considering the increasing traffic demand 10 From Singh et al. Sigcomm15
Management complexity is important ⎯ Topology expansion leads to capacity drop due to rewiring ⎯ Complex expansion leads to degraded capacity for a long time New added switches 11
Challenges Contributions 12
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion 13
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned 14
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned New topology Is there a topology family with lower management ⎯ complexity, lower cost and high capacity? FatClique 15
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned New topology Is there a topology family with lower management ⎯ complexity, lower cost and high capacity? FatClique 16
Lifecycle management overview ⎯ Problems: packaging, wiring, placement, rewiring... ⎯ Constraints: switch, rack, patch panel, cable tray... Broadcom Trident 3 Rack Optical patch panel Cable tray 17
Methodology From first principles ⎯ Understand in detail how topologies are deployed and expanded ⎯ Derive metrics that capture the complexity of these operations 18
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned New topology Is there a topology family with lower management ⎯ complexity, lower cost and high capacity? FatClique 19
Packaging Deployment switch server ... data center racks 20
Packaging Deployment switch server ... server rack server rack switch rack switch rack Metric: number of switches 21
Wiring Deployment Intra-rack links: short and cheap switch rack 22
Wiring complexity Deployment Inter-rack links ... ... rack rack 23
Wiring Deployment Inter-rack links over cable trays (expensive) ... ... rack rack Main wiring complexity comes from inter-rack links! 24
Cable bundling Deployment Too many fibers to be handled individually! 25
Cable bundling Deployment Cable bundle ⎯ a fixed number of identical-length fibers between two clusters of network devices. Bundle type ⎯ capacity (# fibers in a bundle) ⎯ length 26
Cable bundling Deployment Bundle type: (bundle capacity, bundle length) Top view of racks w/o bundling 16 individual fibers, 4 types of length 27
Cable bundling Deployment Bundle type: (bundle capacity, bundle length) Top view of racks w/o bundling w/ bundling bundle aggregator 16 individual fibers, 4 types of length 8 equal-length bundles, 1 bundle type Metric: the number of bundle types 28
Cable bundling Deployment It is hard to handle individual fibers with various length! w/o bundling w/ bundling [Singh, et al. Sigcomm15] 29
Role of patch panel in bundling Deployment Aggregator: Patch panel Aggregator 30
Role of patch panel in bundling Deployment Aggregator: Patch panel Manual process Metric: the number of patch panels 31
Deployment complexity metrics 32
Deployment complexity metrics # switches ... 33
Deployment complexity metrics # switches ... # patch panels 34
Deployment complexity metrics # switches ... # patch panels # bundle types 35
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned New topology Is there a topology family with lower management ⎯ complexity, lower cost and high capacity? FatClique 36
Expansion complexity New Metric: # Expansion steps 37
A single expansion step complexity It is hard to move existing links in cable trays 38
A single expansion step complexity Existing links New links Patch panel Patch panel rack rack 39
A single expansion step complexity Existing links New links New Spine Patch panel Patch panel Patch panel Patch panel rack rack rack rack 40
A single expansion step complexity Existing links New links New Spine Patch panel Patch panel Patch panel Patch panel rack rack rack rack Metric: # Rewired links per patch panel rack 41
Metrics Deployment # Switches # Patch panels # Bundle types Expansion # Expansion step # Rewired links per patch panel rack 42
Challenges Contributions Metrics How to characterize the management complexity? ⎯ Deployment ⎯ Expansion Comparison of topologies How does topology structure affect the management ⎯ No topology dominates complexity? ⎯ Principles learned New topology Is there a topology family with lower management ⎯ complexity, lower cost and high capacity? FatClique 43
Topology comparison case study We equalize capacities of topologies 4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps 44
Topology comparison case study We equalize capacities of topologies 4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps 45
Topology comparison case study We equalize capacities of topologies 4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps 46
Topology comparison case study We equalize capacities of topologies 4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps No topology dominates by all metrics! 47
Principles learned ⎯ Importance of regularity ⎯ Importance of maximizing intra-rack links ⎯ Importance of fat edge 48
Principle 1: Importance of regularity Jellyfish is a random graph which leads to non-uniform bundles between switch clusters. In large scale, Jellyfish has one order of magnitude more bundle types than Clos! 49
Principle 2: Importance of maximizing intra-rack links Rack in Clos Intra-rack Inter-rack Switch 50
Principle 2: Importance of maximizing intra-rack links Rack in Clos Intra-rack Inter-rack Switch Rack in Jellyfish Intra-rack Inter-rack Switch Most links in Jellyfish are inter-rack links, which leads to more patch panel usage and high wiring complexity! 51
Principle 3: Importance of fat edge Network edge 52
Principle 3: Importance of fat edge Northbound links Switches Southbound links Servers 53
Principle 3: Importance of fat edge Thin Edge North:South = 1:1 54
Principle 3: Importance of fat edge Fat Edge Thin Edge North:South = 1:1 North:South = 2:1 55
Principle 3: Importance of fat edge Residual capacity requirement during expansion: 75% Rewiring leads to capacity drop; Drain traffic before rewiring Draining 25% links --> 25% lose Thin Edge Fat Edge North:South = 1:1 North:South = 2:1 56
Recommend
More recommend