Social Network Social Network Web Network DNA Interaction Follow Network Network User-Product Network
Nonuniform network comm costs Nonuniform comp requirement ✓ ✓ Contentiousness of the memory Nonuniform comm requirement ✓ ✓ subsystems Time-varying skewness ✓
Architecture- and Workload-Aware Graph (Re)Partitioning Aragon [BigGraphs’14] Planar+ [To submit’17] (small dynamic graphs) (large dynamic graphs) Paragon [EDBT’16] Argo [BigData’16] (median-size dynamic graphs) (static graphs) Planar [ICDE’16] Sargon [ICDE’17] (large dynamic graphs) (skew-resistant)
Architecture- and Workload-Aware Graph (Re)Partitioning Aragon [BigGraphs’14] Planar+ [To submit’17] (small dynamic graphs) (large dynamic graphs) Paragon [EDBT’16] Argo [BigData’16] (median-size dynamic graphs) (static graphs) Planar [ICDE’16] Sargon [ICDE’17] (large dynamic graphs) (skew-resistant)
❖ ➢ ➢ ➢ ➢
S k+4 S k S k+1 S k+2 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan Phase-3: Convergence Check ★ Still beneficial?
S k+4 S k S k+1 S k+2 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions ★ Each vertex has up-to-date Phase-2: Physical Vertex Migration Phase-2: Vertex Location Update ★ Perform the Migration Plan locations of their neighbors Phase-3: Convergence Check ★ Still beneficial?
Physical Vertex Migration Starts Repartitioning Converge S k+4 S k S k+1 S k+2 S k+5 Planar Planar Planar Planar Planar
● ● ●
Socket 0 Socket 1 Socket 0 Socket 1 … … … … … … … … core core core core core core core core … … … … … … … … L1 L1 L1 L1 L1 L1 L1 L1 … … … … L2 L2 L2 L2 L2 L2 L2 L2 QPI/ QPI/ L3 L3 L3 L3 HT HT Inter-socket Inter-socket Inter-socket Inter-socket Memory Memory Memory Memory Link Link Link Link Controller Controller Controller Controller Controller Controller Controller Controller Memory Memory Memory Memory Machine 0 Machine 1
● ● ●
Socket 0 Socket 1 Socket 0 Socket 1 … … … … … … … … core core core core core core core core … … … … … … … … L1 L1 L1 L1 L1 L1 L1 L1 … … … … L2 L2 L2 L2 L2 L2 L2 L2 QPI/ QPI/ L3 L3 L3 L3 HT HT Inter-socket Inter-socket Inter-socket Inter-socket Memory Memory Memory Memory Link Link Link Link Controller Controller Controller Controller Controller Controller Controller Controller Memory Memory Memory Memory Machine 0 Machine 1
λ ★ ★ ★ ★ 1x 1.18x Hours CPU Time Saving 1.5x 1.7x PARAGON 25h PLANAR 27h 2.8x PLANAR+ 43h uniPLANAR+ 10h ✓
λ ★ ★ ★ ★ 1x 1.18x Hours CPU Time Saving 1.5x 1.7x PARAGON 25h PLANAR 27h 2.8x PLANAR+ 43h uniPLANAR+ 10h ✓ ○ ○
★ ○ ■ ■ ○ ○ ■ ■
Architecture- and Workload-Aware Graph (Re)Partitioning Aragon [BigGraphs’14] Planar+ [To submit’17] (small dynamic graphs) (large dynamic graphs) Paragon [EDBT’16] Argo [BigData’16] (median-size dynamic graphs) (static graphs) Planar [ICDE’16] Sargon [ICDE’17] (large dynamic graphs) (skew-resistant)
● ● ○ ■ ■
Vertex Stream ... Partitioner ...
✓ ○ ○
✓ ○ ○
� ∈ Bottleneck Network Memory � � ✓ ○ ○
★ ★ ★ ✓ ○ ○
★ ★ ★ 50x 38x 12x 9x 9x 6x 4x 3x 1x 1x 1.2x 1x ✓ ✓
★ ★ ★ ✓ ✓
SSSP Execution Time (s) m:s:c METIS LDG 1:2:8 633 2,632 2:2:4 654 2,565 9x 4:2:2 521 631 8:2:1 222 280 ✓ ○ ○
SSSP Execution Time (s) SSSP LLC Misses (in Millions) m:s:c m:s:c METIS LDG METIS LDG 1:2:8 633 2,632 1:2:8 10,292 44,117 2:2:4 654 2,565 2:2:4 10,626 44,689 9x 235x 4:2:2 521 631 4:2:2 2,541 1,061 8:2:1 222 280 8:2:1 96 187 ✓ ○ ○
SSSP Execution Time (s) SSSP LLC Misses (in Millions) m:s:c m:s:c METIS LDG METIS LDG 1:2:8 633 2,632 1:2:8 10,292 44,117 2:2:4 654 2,565 2:2:4 10,626 44,689 9x 235x 4:2:2 521 631 4:2:2 2,541 1,061 8:2:1 222 280 8:2:1 96 187 ✓ ○ ○
SSSP Execution Time (s) SSSP LLC Misses (in Millions) m:s:c m:s:c METIS LDG METIS LDG 1:2:8 633 2,632 1:2:8 10,292 44,117 2:2:4 654 2,565 2:2:4 10,626 44,689 9x 235x 4:2:2 521 631 4:2:2 2,541 1,061 8:2:1 222 280 8:2:1 96 187 METIS had lower execution time and LLC misses than LDG. ✓ Edge-cut matters. ○ Higher edge-cut-->higher comm-->higher contention ○
✓ ○ ○ ■ ✓ ○ ○ ○ ■ ■
Architecture- and Workload-Aware Graph (Re)Partitioning Aragon [BigGraphs’14] Planar+ [To submit’17] (small dynamic graphs) (large dynamic graphs) Paragon [EDBT’16] Argo [BigData’16] (median-size dynamic graphs) (static graphs) Planar [ICDE’16] Sargon [ICDE’17] (large dynamic graphs) (skew-resistant)
● ● ● ●
● Assign a label vector to each vertex to indicate: ○ the time periods the vertex is active in ○ whether it is a high- or low-degree vertex ○ the hotness of the vertex
Vertex Stream ... Partitioner ...
● ●
BFS and SSSP Workloads (one randomly selected source vertex) Dataset Orkut (|V|=3M, |E|=234M) # of Traces 5 Collected Percentage of the vertices overlapped Similarity in the peak superstep Workloads Avg. Similarity Std. Deviation BFS 60.80% 8.43% SSSP 64.73% 10.63%
★ ★ ★ 2x 1.68x 1.57x 1x Up to 2x speedups (hours CPU time saving). ✓
✓ ○ ○ ○ ✓ ○ ○
Thanks! Architecture- and Workload-Aware ▪ Graph (Re)Partitioning ▪ Aragon [BigGraphs’14] Planar+ [To submit’17] ▪ (small dynamic graphs) (large dynamic graphs) ▪ ▪ Paragon [EDBT’16] Argo [BigData’16] (median-size dynamic graphs) (static graphs) Planar [ICDE’16] Sargon [ICDE’17] (large dynamic graphs) (skew-resistant)
Recommend
More recommend