leveraging heterogeneity to reduce the cost of data
play

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades - PowerPoint PPT Presentation

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades Andy Curtis joint work with: S. Keshav Alejandro Lpez-Ortiz Tommy Carpenter Mustafa Elsheikh University of Waterloo Motivation Data centers critical part of IT


  1. Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades Andy Curtis joint work with: S. Keshav Alejandro López-Ortiz Tommy Carpenter Mustafa Elsheikh University of Waterloo

  2. Motivation • Data centers critical part of IT infrastructure Data centers change over time • Expensive - $1000/year/server -

  3. Data centers constantly evolve - 63% of Data Center Knowledge readers are either in the midst of data center expansion projects or have just completed a new facility - 59% continue to build and manage their data centers in- house http://www.datacenterknowledge.com/archives/2010/08/16/data-center-industry-expansion-in-full-swing/

  4. Network upgrade motivation

  5. Network upgrade motivation • Several prior solutions for green fi eld data centers - VL2, fl attened butter fl y, HyperX, BCube, DCell, Al-Fares et al. , MDCube

  6. Network upgrade motivation • Several prior solutions for green fi eld data centers - VL2, fl attened butter fl y, HyperX, BCube, DCell, Al-Fares et al. , MDCube • What about legacy data centers?

  7. Existing topologies are not fl exible enough

  8. Existing topologies are not fl exible enough

  9. Existing topologies are not fl exible enough ?

  10. Goal It should be easy and cost-e ff ective to add capacity to a data center network

  11. Challenging problem • Designing a data center expansion or upgrade isn’t easy - Huge design space - Many constraints

  12. Problem 1 • It’s hard to analyze and understand heterogeneous topologies Problem 2 • How to design an upgraded topology?

  13. Problem 1 • High performance network topologies are based on rigid constructions - Homogeneous switches - Prescribed switch radix - Single link rate

  14. Problem 1 • High performance network topologies are based on rigid constructions - Homogeneous switches - Prescribed switch radix - Single link rate Solutions: 1. develop theory of heterogeneous Clos networks 2. explore unstructured data center network topologies

  15. Two solutions: LEGUP: output is a heterogeneous Clos network [Curtis, Keshav, López-Ortiz; CoNEXT 2010] REWIRE: designs unstructured DCN topologies [Curtis et al.; INFOCOM 2012]

  16. Two solutions: LEGUP: output is a heterogeneous Clos network [Curtis, Keshav, López-Ortiz; CoNEXT 2010] REWIRE: designs unstructured DCN topologies [Curtis et al.; INFOCOM 2012]

  17. LEGUP in brief: LEGUP designs upgraded/expanded networks for legacy data center networks

  18. LEGUP in brief: LEGUP designs upgraded/expanded networks for legacy data center networks Input • Budget • Existing network topology • List of switches & line cards • Optional: data center model . . . . . .

  19. LEGUP in brief: LEGUP designs upgraded/expanded networks for legacy data center networks . . . . . . . . . . . . Input Output

  20. LEGUP in brief: LEGUP designs upgraded/expanded networks for legacy data center networks . . . . . . . . . . . . Input Output

  21. LEGUP in brief: LEGUP designs upgraded/expanded networks for legacy data center networks . . . . . . . . . . . . Input Output Di ffi cult optimization problem

  22. Di ffi cult optimization problem First pass: limit solution space by fi nding only heterogeneous Clos networks

  23. Clos networks This is a physical realization of a Clos network Internet Core . . . Aggregation . . . ToR

  24. Clos networks We can fi nd a logical topology for this network 16 16 4 4 4 4 4 4 4 4

  25. Heterogeneous Clos networks Logical topology is a forest 8 8 8 8 2 2

  26. Theoretical contributions *optimal = uses same link capacity an equivalent stage Clos network

  27. Theoretical contributions Lemma 1: How to construct all optimal logical forests for a set of switches *optimal = uses same link capacity an equivalent stage Clos network

  28. Theoretical contributions Lemma 1: How to construct all optimal logical forests for a set of switches Lemma 2: How to build a physical realization from a logical forest *optimal = uses same link capacity an equivalent stage Clos network

  29. Theoretical contributions Lemma 1: How to construct all optimal logical forests for a set of switches Lemma 2: How to build a physical realization from a logical forest Theorem: A characterization of heterogeneous Clos networks *optimal = uses same link capacity an equivalent stage Clos network

  30. Theoretical contributions Lemma 1: How to construct all optimal logical forests for a set of switches Lemma 2: How to build a physical realization from a logical forest Theorem: A characterization of heterogeneous Clos networks This is the fi rst optimal heterogeneous topology *optimal = uses same link capacity an equivalent stage Clos network

  31. Problem 1 • It’s hard to analyze and understand heterogeneous topologies more later... Problem 2 • How to design an upgraded topology?

  32. Problem 1 • It’s hard to analyze and understand heterogeneous topologies Problem 2 • How to design an upgraded topology? heterogeneous Clos

  33. Problem 2 Upgraded network should: • Maximize performance, minimize cost • Be realized in the target data center • Incorporate existing network equipment if it makes sense Approach: use optimization

  34. LEGUP algorithm • Branch and bound search of solution space - Heuristics to map switches to a rack • See paper for details • Time is bottleneck in algorithm - Exponential in number of switch types and (worst-case) in number ToRs - 760 server data center: 5–10 minutes to run algorithm - 7600 server data center: 1–2 days - But can be parallelized

  35. LEGUP summary • Developed theory of heterogeneous Clos networks • Implemented LEGUP design algorithm • On our data center, we see substantial cost savings: spend less than half as much money as a fat-tree for same performance

  36. Two solutions: LEGUP: output is a heterogeneous Clos network [Curtis, Keshav, López-Ortiz; CoNEXT 2010] REWIRE: designs unstructured DCN topologies [Curtis et al.; INFOCOM 2012]

  37. Can we do better with unstructured networks? 8 52 20 28 39 32 27 11 70 23 36 30 47 13 64 41 53 0 24 5 68 72 69 29 6 48 51 31 22 77 43 21 10 46

  38. Problem • Now we have an even harder network design problem

  39. Problem • Now we have an even harder network design problem Approach • Use local search heuristics to fi nd a “good enough” solution

  40. REWIRE Uses simulated annealing to fi nd a network that: - Maximizes performance Subject to: - The budget - Physical constraints of the data center model (thermal, power, space) - No topology restrictions

  41. REWIRE Uses simulated annealing to fi nd a network that: - Maximizes performance Bisection bandwidth - Diameter Subject to: - The budget - Physical constraints of the data center model (thermal, power, space) - No topology restrictions

  42. REWIRE Uses simulated annealing to fi nd a network that: - Maximizes performance Subject to: Costs = new cables + moved cables - The budget + new switches - Physical constraints of the data center model (thermal, power, space) - No topology restrictions

  43. Simulated annealing algorithm • At each iteration, computes - Performance of candidate solution - If accept this solution, then • Compute next neighbor to consider

  44. Simulated annealing algorithm • At each iteration, computes - Performance of candidate solution - If accept this solution, then No known algorithm to fi nd the bisection bandwidth of an • Compute next neighbor to consider arbitrary network!

  45. Bisection bandwidth computation Easy for a single cut

  46. Bisection bandwidth computation S’ S

  47. Bisection bandwidth computation bw(S,S’) = link cap(S,S’) min { server rates(S), server rates(S’) } S’ S

  48. Bisection bandwidth computation bw(S,S’) = 4 min { 2, 6 } S’ S

  49. Bisection bandwidth computation Then bisection bandwidth is the min over all cuts S’ S

  50. Bisection bandwidth computation • Easy on tree-like topologies because there are O(n) cuts

  51. Bisection bandwidth computation • Easy on tree-like topologies because there are O(n) cuts

  52. Bisection bandwidth computation • Easy on tree-like topologies because there are O(n) cuts

  53. Bisection bandwidth computation • Easy on tree-like topologies because there are O(n) cuts

  54. Bisection bandwidth computation

  55. Bisection bandwidth computation Exponentially many cuts on arbitrary topologies

  56. Bisection bandwidth computation Exponentially many cuts on arbitrary topologies Need: A min-cut, max- fl ow type theorem for multi- commodity fl ow s t

  57. Bisection bandwidth computation Need: A min-cut, max- fl ow type theorem for multi- commodity fl ow t 1 s 1 s 2 t 2 s 3

  58. Bisection bandwidth computation

  59. Bisection bandwidth computation Theorem [Curtis and López-Ortiz, INFOCOM 2009] : A network can feasibly route all tra ffi c matrices feasible under the server NIC rates using multipath routing i ff all its cuts have bandwidth ≥ a sum dependent on α i for all nodes i

Recommend


More recommend