Pr Prog ogrammin ramming g th the e T op opolo ology gy of N f Netw etworks orks T ec echno hnolo logy gy and nd Alg lgor orithms ithms Manya Ghoba badi di Pierre Blanche Klaus-Tycho Foerster Daniel Kilper Gireeja Ranade Jeff Cox Jamie Gaudette Janar nardhan dhan Kulkar karni i Houman Rastegarfar Nikhil Devanur Phillipa Gill Ratul Mahajan Stefan Schmid Mark Filer Madeleine Glick Amar Phanishayee Rachee Singh
Th The e clou oud d infras frastructu tructure Inefficiencies and waste in the cloud infrastructure Data centers Optical cables 2
T ec echn hnolo ology gy and d algor gorithms ithms to to op opti timize mize net etwor ork to topol ology ogy Network topology Traffic engineering Capacity provisioning 3
High gh-lev level el ide dea Static topology Dynamic topology 1 1 A B A B Networks with programmable 1 1 1 1 topologies D C D C 1 1 Throughput: 2 units Throughput: 3 units 4
Pr Prog ogramm rammab able le to topol ologie ogies • Challenging llenging: : • Requires reconfigurable hardware te technolo chnology gy • Requires revisiting networking layer algori rithms thms • Imp mpactful ctful: • Cheaper networks • Higher throughput 5
T alk lk ou outl tline ine T echnology and algorithms to enable programmable topologies in the cloud Wide de-area networks tworks Data ta cente nter netw tworks orks Level3 global backbone Google data center ProjecT oR: Programming the Programming the capacity of links network topology [SIGCOMM’16] [SIGCOMM’18, HotNets’17, OFC’16] 6
T oday’s data center interconnects A B C D A B C D 0 3 3 3 A 0 0 0 0 0 0 6 0 0 6 0 0 0 6 6 0 0 0 0 0 A 10Gbps B 3 0 3 3 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 C C 3 3 0 3 0 0 0 8 0 0 0 0 0 0 0 0 0 6 0 0 0 0 7 0 10Gbps D D 3 3 3 0 0 0 0 0 0 12 0 0 0 0 8 0 0 0 0 0 0 0 0 0 A B C D Ideal demand matrix: Non-ideal demand matrix: uniform and static skewed and dynamic Static capacity between T op-of-Rack (T oR) pairs 7
Nee eed for or a r a rec econ onfigurab figurable le inter erconne connect ct Data: • 200K servers across 4 production clusters • Cluster sizes: 100 -- 2500 racks Observation: • Many rack pairs exchange little traffic • Only some hot rack pairs are active Implication: • Static topology with uniform capacity: • Over-provisioned for most rack pairs • Under-provisioned for few others Reconfigurable interconnect: T o dynamically provide additional capacity between hot rack pairs 8
Ou Our r propo oposal: sal: Pr Proj ojec ecT oR oR inter tercon connect nect • Free-space topology (programmable) • Digital micromirror device to redirect light • Disco-ball shaped mirror assembly to magnify reachability Laser Photodetector 9 Static topology 9
Di Digital gital Micr cromirr omirror or De Device ce (DMD) DMD) Array of micromirrors (10 um) Memory cell 10
A 3-T oR oR Pr Proj ojec ecT oR oR inte terconnect connect pr prot otot otype ype T oR 3 T oR 3 Mirrors reflecting to T oR 2 and T oR 3 DMD Source laser T oR 2 T oR 1 T oR 2 T oR 1 11
Rou outing ting algor gorithm ithm lasers photodetectors ToR 1 ToR 1 ToR 2 ToR 2 ToR 3 ToR 3 • We have a highly flexible topology allowing for millions of ways to connect lasers to photodetectors • Ideal solution: fast changing topology to adapt to demand change • Challenge: It takes 12μs to reprogram a link 12
Rou outing ting algor gorithm ithm lasers photodetectors ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 2 T oR 2 ToR 2 ToR 2 ToR 2 ToR 2 ToR 3 ToR 3 ToR 3 ToR 3 T oR 3 T oR 3 dedicated topology opportunistic links • Two topology approach: • Slow switching topology or dedicated topology • Fast switching links or opportunistic links 13
Rou outing ting packets ets 2 2 2 2 2 3 2 3 ToR 1 ToR 1 2 2 3 Virtual output queues opportunistic link ToR 2 ToR 2 ToR 3 ToR 3 dedicated topology K-shortest paths routing 14
Scheduling eduling op oppor ortun tunistic istic lin inks ks • Given a set of potential links and current traffic demand, find a set of active opportunistic links d e s t i n a t i o n ToR 1 ToR 1 s 0 0 100 o u 100 0 0 r ToR 2 T oR 2 c 0 0 0 e ToR 3 ToR 3 15
Scheduling eduling op oppor ortun tunistic istic lin inks ks • Standard switch scheduling problem • Blossom matching input output d e s t i n a t i o n • BvN matrix decomposition s 0 100 0 o • Centralized scheduler u 0 0 100 r c • Single tiered matching 100 0 0 e 16
Scheduling eduling op oppor ortun tunistic istic lin inks ks • Standard switch scheduling problem • Blossom matching Dst T oRs input output Src T oRs • BvN matrix decomposition Decentralized • Centralized scheduler Two-tiered • Single tiered matching Extended the Gale-Shapely algorithm for finding stable matches [GS-1962] Constant competitive against an offline optimal allocation 17
Simu mulation lation res esul ults ts 40 - Slow switching time Average Flow Completion Time FireFly 35 [ SIGCOMM’14 ] 30 • T ail flow completion time 25 • Different traffic matrices Fat tree - No reconfigurability (ms) 20 • Impact of switching time [SIGCOMM’08] 15 • Impact of fan-out 95% 10 + Reconfigurable 5 ProjecT oR + Switching time: 12 μ s 0 20 30 40 50 60 70 80 Average Load (%) 18
Th The e key ey ta takea eaway way from om th this is ta talk Current assumption: Network topology is fixed New world: Network topology is dynamic c 3 u v Problems to solve: Scheduling • s t c 7 c 8 Capacity provisioning • Traffic engineering • w y Load-balancing • c 4 • Exci citing ting: Unusual wealth of algorithms • Challe lleng ngin ing: : Changes fundamental assumptions • Imp mpactfu ful: l: Better efficiency ($/Gbps) 19
Recommend
More recommend