efficient distributed workload re embedding
play

Efficient Distributed Workload (Re-)Embedding Monika Stefan - PowerPoint PPT Presentation

Efficient Distributed Workload (Re-)Embedding Monika Stefan Stefan Henzinger Neumann Schmid Many Years Ago Single server Systems were fixed and workload-agnostic Simple communication patterns (if at all), endpoints fixed


  1. Efficient Distributed Workload (Re-)Embedding Monika Stefan Stefan Henzinger Neumann Schmid

  2. Many Years Ago • Single server • Systems were fixed and workload-agnostic • Simple communication patterns (if at all), 
 endpoints fixed https://www.flickr.com/photos/jurvetson/157722937 2

  3. Nowadays • Large distributed systems 
 (even geographically distributed): communication over network • Virtualization technologies enable workload-aware operations that improve system e ffi ciency • Communicating processes can be far away and 
 re-locating them is costly https://wikileaks.org/amazon-atlas/map/ 
 3 https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG

  4. Nowadays • Large distributed systems 
 (even geographically distributed): communication over network • Virtualization technologies enable workload-aware operations that improve system e ffi ciency • Communicating processes can be far away and 
 re-locating them is costly • Communication requests contain patterns https://wikileaks.org/amazon-atlas/map/ 
 3 https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG

  5. 
 
 
 
 Nowadays • Large distributed systems 
 (even geographically distributed): communication over network • Virtualization technologies enable workload-aware operations that improve system New challenge e ffi ciency • Communicating processes can When to 
 be far away and 
 re-locate workloads? re-locating them is costly • Communication requests How to exploit the patterns? contain patterns https://wikileaks.org/amazon-atlas/map/ 
 3 https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG

  6. The Model 4

  7. The Model ℓ servers 4

  8. The Model data centers RACK scale ℓ servers computing 4

  9. The Model server ℓ data centers RACK scale ℓ servers computing 4

  10. The Model server ℓ occupied n VM slot The VMs are the workloads. n virtual machines (VMs) 5

  11. The Model server ℓ occupied n VM slot free 
 ε n VM slot ε n additional slots for VMs 6

  12. The Model server ℓ occupied n VM slot free 
 ε n VM slot Communication requests arrive online 7

  13. The Model server ℓ occupied n VM slot free 
 ε n VM slot Communication requests arrive online 7

  14. The Model server ℓ occupied n VM slot free 
 ε n VM slot 0 Communication requests arrive online 7

  15. The Model server ℓ occupied n VM slot free 
 ε n VM slot 0 0 Communication requests arrive online 7

  16. The Model server ℓ occupied n VM slot free 
 ε n VM slot 0 0 0 Communication requests arrive online 7

  17. The Model server ℓ occupied n VM slot free 
 ε n VM slot Old communication links stay forever 8

  18. The Model server ℓ occupied n VM slot free 
 ε n VM slot Communication requests arrive online 9

  19. The Model server ℓ occupied n VM slot free 
 ε n VM slot 1 Communication requests arrive online 9

  20. The Model server ℓ occupied n VM slot free 
 ε n VM slot 1 1 Communication requests arrive online 9

  21. The Model server ℓ occupied n VM slot free 
 ε n VM slot 1 1 1 Communication requests arrive online 9

  22. The Model server ℓ occupied n VM slot free 
 ε n VM slot re-location Re-locate VMs to avoid cost cross-server communication α > 1 10

  23. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 11

  24. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 12

  25. The Model server ℓ occupied n VM slot free 
 ε n VM slot re-location Re-locate VMs to avoid cost cross-server communication α > 1 13

  26. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 14

  27. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 15

  28. The Model server ℓ occupied n VM slot free 
 ε n VM slot re-location Re-locate VMs to avoid cost cross-server communication α > 1 16

  29. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 17

  30. The Model server ℓ occupied n VM slot free 
 ε n VM slot α re-location Re-locate VMs to avoid cost cross-server communication α > 1 18

  31. The Model server ℓ occupied n VM slot free 
 ε n VM slot re-location Re-locate VMs to avoid cost cross-server communication α > 1 19

  32. The Model server ℓ occupied n VM slot • Internal server communication cost: free 
 0 ε n VM slot • Server-server communication cost: 1 • VM re-location cost: α ➡ Given an online sequence of communication requests, 
 minimize total cost paid for communication α 1 0 20

  33. The Model server ℓ occupied n VM slot • Internal server communication cost: free 
 0 ε n VM slot • Server-server communication cost: 1 After all • VM re-location cost: α communications finished: ➡ Given an online sequence of communication requests, 
 1 server = minimize total cost paid for communication 1 component 20

  34. Analysis server ℓ occupied n VM slot free 
 • Competitive analysis comparing to OPT: ε n VM slot • OPT knows all communications in advance • OPT computes solution with optimal cost ALG • (Strict) competitive ratio = OPT 21

  35. Results server 2 occupied n VM slot • For servers: free 
 ℓ = 2 ε n VM slot O ( ) log n • Algorithm which is -competitive ε • Lower bound: Any algorithm must be 
 -competitive Ω (1/ ε + log n ) ➡ Our results are almost tight for two servers 22

  36. Results server ℓ occupied n VM slot free 
 • For servers: ℓ ε n VM slot O ( ( ℓ log n log ℓ )/ ε ) • Algorithm which is -competitive ➡ E ffi cient when is small, 
 ℓ e.g., for communication across data centers ➡ Implementable for distributed computation 
 ℓ = O ( ε n ) communication cost ≤ communication for re-locating VMs (if ) 23

  37. Applications • Distributed Union Find Data Structure 
 (with small cost for re-locating the sets across servers) • Online Balanced k-way Partition 
 (with small cost for re-assigning numbers to balanced partitions) 24

  38. Algorithm for Two Servers Color each VM based on its initial server 25

  39. Algorithm for Two Servers 26

  40. Algorithm for Two Servers 26

  41. Algorithm for Two Servers Move small component to larger one 26

  42. Algorithm for Two Servers Move small component to larger one 26

  43. Algorithm for Two Servers Move small component to larger one 27

  44. Algorithm for Two Servers Move small component to larger one 28

  45. Algorithm for Two Servers Move small component to larger one 28

  46. Algorithm for Two Servers Move small component to larger one 28

  47. Algorithm for Two Servers Move small component to larger one 29

  48. Algorithm for Two Servers Move small component to larger one 30

  49. Algorithm for Two Servers 30

  50. Algorithm for Two Servers 31

  51. Algorithm for Two Servers Contains more yellow than green VMs 31

  52. Algorithm for Two Servers Contains more yellow than green VMs Majority-voting step 31

  53. Algorithm for Two Servers Contains more yellow than green VMs assign to yellow server Majority-voting step 31

  54. Algorithm for Two Servers Contains more yellow than green VMs assign to yellow server Majority-voting step 32

  55. Algorithm for Two Servers Contains more yellow than green VMs assign to yellow server Ensures that we stay Majority-voting step close to initial assignment 32

  56. For each new 
 communication request: • Move smaller component to the 
 server of the larger one 
 • If size of new component exceeds a power of 2: 
 Perform majority-voting step 
 • If server capacity exceeded: 
 Find cheapest balanced 
 Can only happen 
 assignment using 
 O ( ) log n times brute-force enumeration ε 33

  57. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 34

  58. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 34

  59. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 34

  60. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 Traverse tree from root downwards and 
 perform majority voting step at each internal node 34

  61. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 Traverse tree from root downwards and 
 perform majority voting step at each internal node 34

  62. Generalization to Servers ℓ S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 Traverse tree from root downwards and 
 perform majority voting step at each internal node 34

Recommend


More recommend