nice to meet you the network matters
play

Nice to meet you! The Network Matters Cloud-based applications - PowerPoint PPT Presentation

Competitive Clustering of Stochastic Communication Patterns on the Ring Chen Avin Louis Cohen Stefan Schmid Nice to meet you! The Network Matters Cloud-based applications generate significant network traffic E.g., scale-out


  1. Competitive Clustering of Stochastic Communication Patterns on the Ring Chen Avin Louis Cohen Stefan Schmid Nice to meet you!

  2. The Network Matters ❏ Cloud-based applications generate significant network traffic ❏ E.g., scale-out databases, streaming, batch processing applications ❏ E.g., Hadoop Terrasort job: Shuffle phase

  3. Example: VM Placememt ❏ Virtual machine placement affects bandwidth costs ❏ Example: map reduce in Clos datacenter

  4. Example: VM Placememt ❏ Virtual machine placement affects bandwidth costs ❏ Example: map reduce in Clos datacenter reducers reducers mappers mappers tenant 1 tenant 2 tenant 2 tenant 1

  5. Example: VM Placememt ❏ Virtual machine placement affects bandwidth costs ❏ Example: map reduce in Clos datacenter Distributed across pods: costly shuffling! reducers reducers mappers mappers tenant 1 tenant 2 tenant 2 tenant 1

  6. Example: VM Placememt ❏ Virtual machine placement affects bandwidth costs ❏ Example: map reduce in Clos datacenter Locally clustered within a rack or pod: efficient! mappers reducers reducers mappers tenant 2 tenant 2 tenant 1 tenant 1

  7. Example: VM Placememt ❏ Virtual machine placement affects bandwidth costs ❏ Example: map reduce in Clos datacenter Communication patterns are often clustered (but can change Locally clustered over time). within a rack or pod: efficient! mappers reducers reducers mappers tenant 2 tenant 2 tenant 1 tenant 1

  8. How to support local communication?

  9. How to support local communication? Option 1: Change the topology (?!)

  10. How to support local communication? Option 1: Change the topology (?!) ❏ Theory of demand-aware networks ❏ Prototypes emerging: e.g., ProjectToR (SIGCOMM 2016) ❏ Based on lasers and mirrors

  11. How to support local communication? Option 1: Change the topology (?!) ❏ Theory of demand-aware networks ❏ We are working on Prototypes emerging: e.g., ProjectToR (SIGCOMM 2016) it! E.g., „ SplayNets @ ❏ TON 2016“. Based on lasers and mirrors But not today!

  12. How to support local communication? Option 1: Change the topology (?!) Option 2: Cluster the nodes ❏ ❏ Theory of demand-aware networks Migrate frequently ❏ communicating nodes closer Prototypes emerging: e.g., ProjectToR together (SIGCOMM 2016) ❏ Based on lasers and mirrors

  13. How to support local communication? Option 1: Change the topology (?!) Option 2: Cluster the nodes ❏ ❏ Theory of demand-aware networks Migrate frequently Today! ❏ communicating nodes closer Prototypes emerging: e.g., ProjectToR together (SIGCOMM 2016) ❏ Based on lasers and mirrors

  14. How to support local communication? Option 1: Change the topology (?!) Option 2: Cluster the nodes ❏ ❏ Theory of demand-aware networks Migrate frequently ❏ communicating nodes closer Prototypes emerging: e.g., ProjectToR together (SIGCOMM 2016) ❏ Based on lasers and mirrors ❏ Challenges of communication pattern clustering: ❏ Communication patterns are not known ahead of time… ❏ … and may even change over time!

  15. How to support local communication? Option 1: Change the topology (?!) Option 2: Cluster the nodes ❏ ❏ Theory of demand-aware networks Migrate frequently ❏ communicating nodes closer Prototypes emerging: e.g., ProjectToR together (SIGCOMM 2016) ❏ Based on lasers and mirrors ❏ Challenges of communication pattern clustering: ❏ Communication patterns are not known ahead of time… Thus: Need to repartition ❏ … and may even change over time! clusters in an online manner, depending on demand!

  16. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 How to cluster?

  17. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 How to cluster? Thickness of line = amount of communication

  18. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4

  19. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 Most communication within cluster (intra- cluster)… … little inter-cluster communication.

  20. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 1 2 3 4 5 6 ❏ Now assume: changes in communication pattern! ❏ E.g., more communication (1,3),(3,4),(2,5) but less (5,6)

  21. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 1 1 2 3 4 5 5 6 ❏ Now assume: changes in communication pattern! ❏ E.g., more communication (1,3),(3,4),(2,5) but less (5,6)

  22. Example: A Re Partitioning Problem ❏ Example: 4 clusters of size 4 1 1 2 3 4 5 5 6 Nodes 1 and 5 ❏ Now assume: changes in communication pattern! change clusters! ❏ E.g., more communication (1,3),(3,4),(2,5) but less (5,6)

  23. Online Re Partitioning A simple and fundamental model (e.g., a rack): size k („# slots “) servers („ clusters “)

  24. Online Re Partitioning A simple and fundamental model (e.g., a rack): … maximize size k („# slots “) intra-cluster communication! Minimize inter-cluster servers („ clusters “) communication …

  25. Online Re Partitioning A simple and fundamental model (e.g., a rack): … maximize size k („# slots “) intra-cluster Also: minimize communication! migrations (=swap)! Minimize inter-cluster servers („ clusters “) communication …

  26. Online Re Partitioning A simple and fundamental model: In practice: k << (many more servers than VM slots per server)! … maximize size k („# slots “) intra-cluster Also: minimize communication! migrations (=swap)! Minimize inter-cluster servers („ clusters “) communication …

  27. Online Re Partitioning Problem inputs: k, , Communication pattern over time

  28. Online Re Partitioning Problem inputs: k, , α 0 Costs: 1 Objective:

  29. Online Re Partitioning Problem inputs: k, , Two flavors: (1) online (worst-case) pattern (2) learning: from a fixed (unkown) distribution α 0 Costs: 1 Objective:

  30. The Crux: Algorithmic Challenges A) Serve remotely or migrate (“rent or buy”)? When to migrate? If a communication pattern is short-lived, it may not be worthwhile to collocate the nodes: the migration cost cannot be amortized.

  31. The Crux: Algorithmic Challenges A) Serve remotely or migrate (“rent or buy”)? When to migrate? If a communication pattern is short-lived, it may not be worthwhile to collocate the nodes: the migration cost cannot be amortized. B) Where to migrate, and what? If nodes should be collocated, the question becomes where. Should the first node be migrated to the cluster of the second or vice versa? Or shall both be moved together to a new cluster? Moreover, an algorithm may be required to pro-actively migrate (resp. swap) additional nodes.

  32. The Crux: Algorithmic Challenges A) Serve remotely or migrate (“rent or buy”)? When to migrate? If a communication pattern is short-lived, it may not be worthwhile to collocate the nodes: the migration cost cannot be amortized. B) Where to migrate, and what? If nodes should be collocated, the question becomes where. Should the first node be migrated to the cluster of the second or vice versa? Or shall both be moved together to a new cluster? Moreover, an algorithm may be required to pro-actively migrate (resp. swap) additional nodes. C) Which nodes to evict? There may not exist sufficient space in the desired destination cluster. In this case, the algorithm needs to decide which nodes to evict, to free up space.

  33. Online Variant: Competitive Ratio and Augmentation ❏ Goal: minimize competitive ratio

  34. Online Variant: Competitive Ratio and Augmentation ❏ Goal: minimize competitive ratio ❏ Two flavors: without and with augmentation

  35. Let’s first look at special case: k =2

  36. Let’s first look at special case: k =2 Need to find pairs!

  37. Let’s first look at special case: k =2 Need to find pairs! Clusters of size 2: A new type of online matching problem!

  38. Special Cases: =2

  39. Special Cases: =2 2 Clusters: A generalization of online caching!

  40. Special Cases: =2 (“Online Caching”) ❏ For 2 clusters: can emulate Models disk online caching! Models cache ❏ k items, cache size k -1 cache disk

  41. Special Cases: =2 (“Online Caching”) ❏ For 2 clusters: can emulate … plus some online caching! dummy item ❏ k items, cache size k -1 d k -1 Cache… cache disk

  42. Special Cases: =2 (“Online Caching”) ❏ For 2 clusters: can emulate online caching! ❏ k items, cache size k -1 d i ❏ When item i is requested in original caching problem: ❏ Introduce many requests k -1 between d and i : forces i to cache (if it is not yet) cache disk

  43. Special Cases: =2 (“Online Caching”) ❏ For 2 clusters: can emulate online caching! ❏ k items, cache size k -1 d i ❏ When item i is requested in original caching problem: ❏ Introduce many requests k -1 between d and i : forces i to cache (if it is not yet) ❏ Which one to evict? Caching problem! cache disk

Recommend


More recommend