Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Department of Electrical and Computer Engineering Colorado State University Fort Collins, CO, USA {rsnbrg, ron.chiang@colostate.edu}
Motivation • Goal – to increase our understanding of heterogeneity in computing platforms 2
Motivation • Goal – to increase our understanding of heterogeneity in computing platforms • Heterogeneous computing platforms – different computing speeds 3
Motivation • Goal – to increase our understanding of heterogeneity in computing platforms • Heterogeneous computing platforms – different computing speeds – architecturally balanced 4
“Understanding” Heterogeneity Suppose we have • n +1 computers: – the server C 0 – a “cluster” C comprising n computers, C 1 , …, C n • Heterogeneity profile of C ρ – C i can complete one unit of work in time i < > ρ ,..., ρ – 1 n – ρ ρ ρ ≥ ≥ ≥ ... 1 2 n 5
The Cluster-Exploitation Problem (CEP) • C 0 must complete as many units of work as possible on cluster C within a given lifespan of L time units 6
The Cluster-Exploitation Problem (CEP) • C 0 must complete as many units of work as possible on cluster C within a given lifespan of L time units • A worksharing protocol – a schedule that solves the CEP 7
Architectural Parameters Fixed communication cost σ – setup time λ – latency negligible over a long lifespan 8
Architectural Parameters and Sample Values Common parameters: τ μ – transmission rate (e.g. 1 sec. / work unit) δ – output-to-input length ratio (= 1) For computer i , π μ – packaging rate (e.g. 10 sec. / work unit) i μ π – unpackaging rate (e.g. 10 sec. / work unit) i – workload (work units) w i 9
10 C 1 C n Worksharing Protocols 1 1 w w ) τ π + 0 ( C 0
11 C 1 C n 1 Worksharing Protocols w 1 ρ ) π + 1 ( n n w w ) τ π + 0 ( C 0
12 C 1 C n Worksharing Protocols n n w ρ π ) 1 1 w ( + w δ 1 δ ) τ πρ + 1 ( C 0
13 C 1 C n Worksharing Protocols n w δ n ) w τ δ + n πρ ( C 0
The FIFO Protocol C 0 sends sends sends work to C 1 work to C 2 work to C 3 π + π + π + τ τ τ ( ) w ( ) w ( ) w 0 1 0 2 0 3 C 1 waits processes results + πρ + π ρ τ δ ( 1 ) w ( ) w 1 1 1 1 C 2 waits processes results + πρ + π ρ τ δ ( 1 ) w ( ) w 2 2 2 2 waits processes results C 3 + πρ + π ρ τ δ ( 1 ) w ( ) w 3 3 3 3 (NOT TO SCALE) 14
The FIFO Protocol is Optimal • Theorem [Adler-Gong-Rosenberg] Over any sufficiently long lifespan L , for any heterogeneous cluster C — no matter what its heterogeneity profile : – FIFO worksharing protocols provide optimal solutions to the cluster-exploitation problem – C is equally productive under every FIFO protocol, i.e., under all startup orderings 15
The Work-Production of FIFO Let ⎛ ⎞ + − π τ τδ − 1 n i 1 ⎜ ⎟ = ∑ ∏ − 0 X 1 ⎜ ⎟ + + + + + + + + π τ π πδ ρ π τ π πδ ρ ( ) ( 1 ) ( ) ( 1 ) ⎝ ⎠ = = i 1 j 1 0 i 0 j 16
The Work-Production of FIFO Let ⎛ ⎞ + − π τ τδ − 1 n i 1 ⎜ ⎟ = ∑ ∏ − 0 X 1 ⎜ ⎟ + + + + + + + + π τ π πδ ρ π τ π πδ ρ ( ) ( 1 ) ( ) ( 1 ) ⎝ ⎠ = = i 1 j 1 0 i 0 j Then, 1 = ⋅ W L + 1 τδ X 17
The Work-Production of FIFO Let ⎛ ⎞ + − π τ τδ − 1 n i 1 ⎜ ⎟ = ∑ ∏ − 0 X 1 ⎜ ⎟ + + + + + + + + π τ π πδ ρ π τ π πδ ρ ( ) ( 1 ) ( ) ( 1 ) ⎝ ⎠ = = i 1 j 1 0 i 0 j ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ = + = + + π τ π πδ To simplify, let A and B 1 , 0 ⎛ ⎞ + ρ τδ B − 1 n i 1 ⎜ ⎟ ∑ ∏ = j X ⎜ ⎟ + + ρ ρ A B A B ⎝ ⎠ = = i 1 j 1 i j 18
On Comparing Heterogeneity Profiles • For any cluster C with heterogeneity profile = ρ ρ P , ... , 1 n 19
On Comparing Heterogeneity Profiles • For any cluster C with heterogeneity profile = ρ ρ P , ... , 1 n • C ’s homogeneous-equivalent computing rate ( HECR ) is { } = ≥ ρ ρ ( ) max X ( P ) X ( P ) c ρ = ρ ( ) ρ ρ P , ... , where 20
Heterogeneity Profiles − + n i 1 = ρ Profile 1 : , which spreads evenly in a range i n 8 7 6 1 = when n 8 , , , ,..., 8 8 8 8 Number of Computers 8 16 32 HECR 0.362 0.297 0.251 Recall: faster cluster has smaller HECR value 21
Heterogeneity Profiles 1 = ρ Profile 2 : i i 1 1 1 1 = when n 8 , , , ,..., 1 2 3 8 Number of Computers 8 16 32 HECR 0.216 0.116 0.061 22
Avg. Speed vs. Std-Dev of Speed 8 computers 0.8 0.7 0.6 0.5 HECR 0.4 Std-Dev=0.2 0.3 0.2 Std-Dev=0.1 0.1 Std-Dev=0.05 0 Avg. Avg. Avg. Speed Speed Speed =0.75 =0.5 =0.25 Randomly generate 100 profiles for each combination 23
Avg. Speed vs. Std-Dev of Speed Std-Dev 8 computers’ HECR 0.2 0.1 0.05 0.75 0.681 0.735 0.759 Avg. Speed 0.5 0.411 0.482 0.501 0.25 0.113 0.208 0.239 The probability that these two groups have the same mean × − is 10 2 10 24
Avg. Speed vs. Std-Dev of Speed Std-Dev 8 computers’ HECR 0.2 0.1 0.05 0.75 0.681 0.735 0.759 Avg. Speed 0.5 0.411 0.482 0.501 0.25 0.113 0.208 0.239 Trials with 16, 32 computers show similar pattern 25
Speeding Up Clusters Optimally under FIFO Protocols • Which one computer should you speed up, if you can speed up only one? 26
Speeding Up Clusters Optimally under FIFO Protocols • Which one computer should you speed up, if you can speed up only one? • We study two variants of this question 27
Speeding Up Clusters Optimally under FIFO Protocols For convenienc e, =< > ρ ρ C - let cluster have heterogene ity profile P ,..., , 1 n ≥ ≥ ≥ ρ ρ ρ where ... 1 2 n > - let i and j i be two computer indices 28
Fixed and Proportional Speed-up • Fixed-speedup scenario φ < ρ • by a fixed amount n = − ( i ) ρ ρ ρ φ ρ ρ ρ ρ ρ P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n = − ρ ρ ρ ρ ρ ρ φ ρ ρ ( j ) P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n 29
Fixed and Proportional Speed-up φ < ρ • Fixed-speedup scenario (by a fixed amount ) n = − ( i ) ρ ρ ρ φ ρ ρ ρ ρ ρ P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n = − ρ ρ ρ ρ ρ ρ φ ρ ρ ( j ) P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n • Proportional-speedup scenario < ψ 1 • by a relative amount = ρ ρ ψρ ρ ρ ρ ρ ρ [ i ] P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n = ρ ρ ρ ρ ρ ψρ ρ ρ [ j ] P ,..., , , ,..., , , ,..., − + − + 1 i 1 i i 1 j 1 j j 1 n 30
Proposition for Fixed-Speedup • Under the fixed-speedup scenario, the most advantageous single computer to speed up is C ’s fastest computer 31
Terms for following figures 1 = ⋅ • Recall: work production W L + 1 τδ X • Work ratio – the ratio of work production after speedup to work production before speedup • Speedup computer – the single computer that is sped up 32
Fixed-Speedup Scenario 1.5 1.4 1.3 Work ratio <1, 1/2, 1/3, 1/4> 1.2 <1/2, 1/4, 1/6, 1/8> 1.1 1 0.9 1 2 3 4 = φ 1 / 16 speedup computer 33
Proposition for Proportional-Speedup = + = + + > π τ π πδ ρ ρ (Recall : A , B 1 , and ) 0 i j > ψρ ρ τδ 2 • If A / B i j – speeding up (faster) is better C j < ψρ ρ τδ 2 • If A / B i j – speeding up (slower) is better C i 34
Proposition for Proportional-Speedup = + = + + > π τ π πδ ρ ρ (Recall : A , B 1 , and ) 0 i j > = × − ψρ ρ τδ 2 5 A / B 1 . 0 10 • If i j – speeding up (faster) is better C j < = × − ψρ ρ τδ 2 5 A / B 1 . 0 10 • If i j – speeding up (slower) is better C i Parameter Rate μ A 11 second / work unit B with coarse 1.000011 second / work unit (1 sec / task) tasks 35
Proposition for Proportional-Speedup = + = + + > π τ π πδ ρ ρ (Recall : A , B 1 , and ) 0 i j > = × − ψρ ρ τδ 2 5 A / B 1 . 0 10 • If i j – speeding up (faster) is better C j < = × − ψρ ρ τδ 2 5 A / B 1 . 0 10 • If i j – speeding up (slower) is better C i That is, it is more advantageous to speed up the faster one unless either both computers are already “very fast” or the speedup factor is “very large.” 36
Recommend
More recommend