Efficient Distributed Workload (Re-)Embedding
Stefan Neumann Monika Henzinger Stefan Schmid
Efficient Distributed Workload (Re-)Embedding Monika Stefan - - PowerPoint PPT Presentation
Efficient Distributed Workload (Re-)Embedding Monika Stefan Stefan Henzinger Neumann Schmid Many Years Ago Single server Systems were fixed and workload-agnostic Simple communication patterns (if at all), endpoints fixed
Stefan Neumann Monika Henzinger Stefan Schmid
workload-agnostic
patterns (if at all), endpoints fixed
https://www.flickr.com/photos/jurvetson/157722937
2
(even geographically distributed):
communication over network
enable workload-aware
efficiency
be far away and re-locating them is costly
https://wikileaks.org/amazon-atlas/map/ https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG
3
(even geographically distributed):
communication over network
enable workload-aware
efficiency
be far away and re-locating them is costly
contain patterns
https://wikileaks.org/amazon-atlas/map/ https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG
3
(even geographically distributed):
communication over network
enable workload-aware
efficiency
be far away and re-locating them is costly
contain patterns
https://wikileaks.org/amazon-atlas/map/ https://commons.wikimedia.org/wiki/File:Bacloud.com_data_center.JPG
How to exploit the patterns? When to re-locate workloads?
New challenge
3
4
4
4
data centers RACK scale computing
server
ℓ
4
data centers RACK scale computing
server
VM slot
ℓ
n
The VMs are the workloads.
5
server
VM slot free VM slot
ℓ
n εn
6
server
VM slot free VM slot
ℓ
n εn
7
server
VM slot free VM slot
ℓ
n εn
7
server
VM slot free VM slot
ℓ
n εn
7
server
VM slot free VM slot
ℓ
n εn
7
server
VM slot free VM slot
ℓ
n εn
7
server
VM slot free VM slot
ℓ
n εn
8
server
VM slot free VM slot
ℓ
n εn
9
server
VM slot free VM slot
ℓ
n εn
1
9
server
VM slot free VM slot
ℓ
n εn
1 1
9
server
VM slot free VM slot
ℓ
n εn
1 1 1
9
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
10
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
11
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
12
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
13
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
14
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
15
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
16
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
17
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
18
server
VM slot free VM slot
ℓ
n εn
α > 1
re-location cost
19
server
VM slot free VM slot
ℓ
n εn
1
➡ Given an online sequence of communication requests,
minimize total cost paid for communication
1
20
server
VM slot free VM slot
ℓ
n εn
1
➡ Given an online sequence of communication requests,
minimize total cost paid for communication
20
After all communications finished: 1 server = 1 component
server
VM slot free VM slot
ℓ
n εn
ALG OPT
21
➡ Our results are almost tight for two servers
server
VM slot free VM slot
2
n εn
O ( log n ε )
Ω(1/ε + log n)
22
ℓ = 2
➡ Efficient when is small,
e.g., for communication across data centers
➡ Implementable for distributed computation
communication cost ≤ communication for re-locating VMs
server
VM slot free VM slot
ℓ
n εn
O ((ℓ log n log ℓ)/ε)
ℓ ℓ
ℓ = O ( εn)
(if )
23
(with small cost for re-locating the sets across servers)
(with small cost for re-assigning numbers to balanced partitions)
24
25
26
26
26
26
27
28
28
28
29
30
30
31
Contains more yellow than green VMs
31
Contains more yellow than green VMs
31
Contains more yellow than green VMs assign to yellow server
31
Contains more yellow than green VMs assign to yellow server
32
Contains more yellow than green VMs assign to yellow server
Ensures that we stay close to initial assignment
32
For each new communication request:
server of the larger one
a power of 2: Perform majority-voting step
Find cheapest balanced assignment using brute-force enumeration
Can only happen times
O ( log n ε )
33
S0 S1 S2 S3 S4 S5 S6 S7
34
S0 S1 S2 S3 S4 S5 S6 S7
34
S0 S1 S2 S3 S4 S5 S6 S7
34
S0 S1 S2 S3 S4 S5 S6 S7
Traverse tree from root downwards and perform majority voting step at each internal node
34
S0 S1 S2 S3 S4 S5 S6 S7
Traverse tree from root downwards and perform majority voting step at each internal node
34
S0 S1 S2 S3 S4 S5 S6 S7
Traverse tree from root downwards and perform majority voting step at each internal node
34
S0 S1 S2 S3 S4 S5 S6 S7
Traverse tree from root downwards and perform majority voting step at each internal node
34
fundamental problems such as union find and k-way partition O ((ℓ log n log ℓ)/ε)
35
S0 S1 S2 S3 S4 S5 S6 S7
. Can we shave the -factor?
can change arbitrarily over time
specific use cases
O ((ℓ log n log ℓ)/ε)
ℓ ℓ
36
Distributed Union Find Data Structure
(with small cost for re-locating the sets)
Online Balanced k-way Partition
(with small cost for assigning numbers to partitions)
Applications
37
new model for distributed workload (re-)embedding
O ((ℓ log n log ℓ)/ε)
distributed algorithm with competitive ratio
38
Ω(n/ℓ) ˜ O(n/ℓ)