pr projector agile gile rec econfigu figurab able le data
play

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data - PowerPoint PPT Presentation

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data C Da Center I Interconne nnect Monia Ghobadi Ratul Mahajan Amar Phanishayee Pierre Blanche Houman Rastegarfar Nikhil Devanur Janardhan Kulkarni Madeleine Glick


  1. Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data C Da Center I Interconne nnect Monia Ghobadi Ratul Mahajan Amar Phanishayee Pierre Blanche Houman Rastegarfar Nikhil Devanur Janardhan Kulkarni Madeleine Glick Daniel Kilper Gireeja Ranade

  2. To Today’s data center interconnects A B C D A B C D 0 3 3 3 0 0 0 0 0 0 0 6 0 6 0 0 0 0 0 0 A 0 6 6 0 A 10Gbps 3 0 3 3 B B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 C C 3 3 0 3 0 0 8 0 0 0 0 0 6 0 0 0 0 0 0 7 0 0 0 0 10Gbps D D 3 3 3 0 0 0 0 0 12 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 A B C D Ideal demand matrix: Non-ideal demand matrix: uniform and static skewed and dynamic Static capacity between ToR pairs 2

  3. Ne Need d fo for a a reconfig igur urable able in interconne nnect Data: • 200K servers across 4 production clusters • Cluster sizes: 100 -- 2500 racks Observation: • Many rack pairs exchange little traffic • Only some hot rack pairs are active Implication: • Static topology with uniform capacity: • Over-provisioned for most rack pairs • Under-provisioned for few others Reconfigurable interconnect: To dynamically provide additional capacity between hot rack pairs 3

  4. De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect Static Reconfigurable Optical switch A B C D Observation: Traffic matrices differ widely • Implication: Difficult to determine static vs. reconfigurable divide • (Seamless interconnect) 4

  5. De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect Observation: Source racks send large amounts of traffic to many other racks • Implications: Should create direct links to lots of other racks (high fan-out) • Should switch quickly among destinations (low switching time) • 5

  6. Properties of reco Pr configurable interco connects Enabler technology Seamless High Fan-out Low switching Enabler technology Seamless High Fan-out Low switching time time Helios, Mordia Optical Circuit Switch Helios, Mordia Optical Circuit Switch [sigcomm’10, sigcomm’13] [sigcomm’10, sigcomm’13] 3D Beam forming, Flyways 60GHz Flyways, 3D Beam forming 60GHz [sigcomm’12, hotnets’09] [sigcomm’11, sigcomm’12] FireFly [sigcomm’14] Free-Space Optics FireFly [sigcomm’14] Free-Space Optics ProjecToR Free-Space Optics 6

  7. Pr ProjecToR inte nterconnect • Free-space topology (seamless) • 18,000 fan-out (60 x more than optical circuit switches) • 12 us switching time (2500 x faster than optical circuit switches) Laser Photodetector 7 Static topology 7

  8. Re Reconfiguration in a Pr ProjecToR in interconne nnect • Digital micromirror device to redirect light • Mirror assembly to magnify reach 8 8

  9. Di Digital Mi Micr cromirr rror De Devi vice (DM DMD) D) Array of micromirrors (10 um) Memory cell 9

  10. Us Using D g DMDs t to r o red edirec ect l ligh ght 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 • Theoretical number of accessible locations: total number of micromirrors • 768x768 = 589824 • Cross-talk between adjacent locations • Achievable number of accessible locations • 768x768 / 32 = 18,432 10

  11. Us Using g mirror assemblies to magnify y reach • Challenge: DMDs have a narrow angular reach • Solution: Coupling DMDs with angled mirrors 11 11

  12. Que Questi tions ns to ans nswer • How feasible is a ProjecToR interconnect? • Built and micro-benchmarked a small ProjecToR prototype • Robustness to environmental conditions • How should packets be routed in a ProjecToR interconnect? • Devised a scheduling algorithm and simulated its performance • How much does a ProjecToR interconnect cost? • Estimated cost based on cost break down of each component 12

  13. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect ToR 3 ToR 2 ToR 1 13

  14. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect Mirrors reflecting to ToR 2 and ToR 3 DMD Source laser 14

  15. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect ToR 3 ToR 2 ToR 1 15

  16. Prototyp ype: throughput ProjecToR Link Wired Link 1.00 0.80 0.60 CDF 0.40 0.20 0.00 8.8 8.9 9 9.1 9.2 9.3 9.4 TCP Throughput (Gbps) 16

  17. Prototyp ype: switching time ToR 3 ToR 2 ToR 1 17

  18. Prototyp ype: swi witch ching time ToR 1 -> ToR 2 ToR 1 -> ToR 3 -10 12 us Receive Power (dBm) -15 -20 -25 -30 -35 -40 -45 -50 0 5 10 15 20 Time (us) 18

  19. Con Connecting l g lasers a and p phot otod odetector ors lasers photodetectors ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 2 ToR 2 ToR 2 ToR 2 ToR 2 ToR 2 ToR 3 ToR 3 ToR 3 ToR 3 ToR 3 ToR 3 dedicated topology opportunistic links • Two topology approach • Slow switching topology or dedicated topology • Fast switching links or opportunistic links 19

  20. Routing packets Ro 2 2 2 2 2 3 2 3 ToR 1 ToR 1 2 2 3 Virtual output queues opportunistic link ToR 2 ToR 2 ToR 3 ToR 3 dedicated topology K-shortest paths routing 20

  21. Scheduling opport Sch rtunistic c links • Given a set of potential links and current traffic demand, find a set of active opportunistic links d e s t i n a t i o n ToR 1 ToR 1 s 0 0 100 o u 100 0 0 r ToR 2 ToR 2 c 0 0 0 e ToR 3 ToR 3 21

  22. Sch Scheduling opport rtunistic c links • Standard switch scheduling problem input output • Blossom matching d e s t i n a t i o n • Matrix decomposition s 0 100 0 o u • Centralized scheduler 0 0 100 r c • Single tiered matching 100 0 0 e 22

  23. Scheduling opport Sch rtunistic c links • Standard switch scheduling problem Dst ToRs input output • Blossom matching Src ToRs • Matrix decomposition Decentralized • Centralized scheduler Two-tiered • Single tiered matching Extended the Gale-Shapely algorithm for finding stable matches [GS-1962] Constant competitive against an offline optimal allocation 23

  24. Si Simulations Fat tree FireFly ProjecToR • 128-ToR (1024 servers) with 16 lasers and photodetectors • Day-long traffic matrix: to build the dedicated topology • 5-min traffic matrix: to generate probability of ToR pair communication • TCP flows arrival with poison arrival rate and realistic flow sizes 24

  25. Si Simulation res esults 40 Average Flow Completion Time - Slow switching time FireFly 35 - Low fan-out 30 • Tail flow completion time 25 • Different traffic matrices Fat tree - No reconfigurability (ms) 20 • Impact of fan-out 15 • Impact of switching time 95% 10 + Reconfigurable 5 ProjecToR + Switching time: 12us 0 20 30 40 50 60 70 80 + high fan-out Average Load (%) 25

  26. ProjecToR: Pr : A reconfig igur urable able da data a center ToR 1 ToR 1 ToR 2 ToR 2 ToR 3 ToR 3 Seamless, high fan- Small prototype Decentralized flow out, low switching demonstrates scheduling time interconnect feasibility algorithm 26

Recommend


More recommend