Planar: Parallel Lightweight Architecture-Aware Adaptive Graph - PowerPoint PPT Presentation

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng , Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1

Graph Partitioning  Applications of Graph Partitioning  Scientific Simulations  Distributed Graph Computation o Pregel, Hama, Giraph  VLSI Design  Task Scheduling  Linear Programming 2

A Balanced Partitioning = Even Load Distribution N2 N1 N3 Balanced: 3

Minimal Edge-Cut = Minimal Data Comm N2 N1 N3 Minimizing Edge-Cut: 4

Minimal Edge-Cut = Minimal Data Comm But Minimal Data Comm≠ Minimal Comm Cost STD DEV: STD DEV: STD DEV: 269 . 71Mb/s 416.82Mb/s 358.34Mb/s Figure 1. Pair-Wise Network Bandwidth (J. Xue , BigData’15 ) Group neighboring vertices as close as possible The partitioner has to be Architecture-Aware 5

Overview of the State-of-the-Art Balanced Graph (Re)Partitioning Partitioners Repartitioners (static graphs) (dynamic graphs) Offline Methods Offline Methods Online Methods Online Methods ( High Quality) (High Quality) (Moderate Quality) (Moderate~High Quality) (Poor Scalability ) (Poor Scalability) (High Scalability) (High Scalability) Architecture-Aware Architecture-Aware 6

Roadmap Conclusions Evaluation Planar Introduction 7

Planar: Problem Statement Given G=(V, E) and an initial Partitioning P: Balancing Load: Minimizing Communication: Network Cost Minimizing Migration: 8

Planar: Overview S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 9

Phase-1a: Minimizing Comm Cost N3 1 N1 N2 N3 N1 6 1 6 1 N2 6 1 N3 1 1 6 N1 N2 10

Phase-1a: Minimizing Comm Cost N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 6 1 ■ Probabilistic vertex migration 6 N1 N2 11

Phase-1a: Minimizing Comm Cost N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 6 1 ■ Probabilistic vertex migration 6 N1 N2 12

Phase-1a@N1: Use vertex a as an example N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 g (a, N1, N1) = 0 Max Gain: 0 Optimal Dest: N1 13

Phase-1a@N1: Move vertex a to N2? N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 old_comm(a, N1) = 2 * 6 + 1 * 1 = 13 new_comm(a, N2) = 1 * 6 + 1 * 1 = 7 N3 1 mig(a, N1, N2) = 1 * 6 = 6 1 g (a, N1, N2) = 13 - 7 - 6 = 0 6 N2 Max Gain: 0 N1 Optimal Dest: N1 14

Phase-1a@N1: Move vertex a to N3? N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 old_comm(a, N1) = 2 * 6 + 1 * 1 = 13 new_comm(a, N3) = 1 * 1 + 2 * 1 = 3 N3 1 mig(a, N1, N3) = 1 * 1 = 1 1 1 g (a, N1, N3) = 13 - 3 - 1 = 9 1 N2 Max Gain: 9 N1 Optimal Dest: N3 15

Phase-1a: Probabilistic Vertex Migration Migration Planning Partition N1 N2 N3 N3 1 Boundary Vtx a b d e g Migration Dest N3 N3 N3 N3 N3 1 6 Gain 9 2 3 0 0 Max Gain 9 3 0 6 N1 N2 Probability 9/9 2/3 3/3 0 0 Migrate with a probability proportional to the gain 16

Phase-1b: Balancing Partitions  Quota-Based Vertex Migration Q1: How much work should each overloaded partition migrate to each underloaded partition? ■ Potential Gain Computation ● Similar to Phase-1a vertex gain computation ■ Iteratively allocate quota starting from the partition pair having the largest gain. Q2: What vertices to migrate? ■ Phase-1a vertex migration, but limited by the quota . 17

Planar: Physical Vertex Migration S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 18

Planar: Convergence Check S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 19

Phase-3: Convergence Repartitioning Epoch Enough changes Converge (structure/load) S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Converge ○ improvement achieved per adaptation superstep < 𝜀 ○ after 𝜐 consecutive adaptation supersteps 𝜀 = 1% and 𝜐 = 10 (via Sensitivity Analysis) 20

Evaluation  Microbenchmarks  Convergence Study (Param Selection)  Partitioning Quality  Real-World Workloads  Breadth First Search (BFS)  Single Source Shortest Path (SSSP)  Scalability Test  Scalability vs Graph Size  Scalability vs # of Partitions  Scalability vs Graph Size and # of Partitions 21

Partitioning Quality: Setup Dataset 12 datasets from various areas # of Parts 40 (two 20-core machines) HP : Hashing Partitioning Initial Partitioners DG : Deterministic Greedy LDG : Linear Deterministic Greedy 22

Partitioning Quality: Datasets Dataset |V| |E| Description wave 156.317 2,118,662 auto 448,695 6,629,222 FEM 333SP 3,712,815 22,217,266 CA-CondMat 108,300 373, 756 Collaboration Network DBLP 317,080 1,049,866 Email-Eron 36,692 183,831 as-skitter 1,696,415 22,190,596 Internet Topology Amazon 334,863 925,872 Product Network USA-roadNet 23,947,347 58,333,344 Road Network roadNet-PA 1,090,919 6,167,592 YouTube 3,223,589 24,447,548 Com-LiveJournal 4,036,537 69,362,378 Social Network Friendster 124,836,180 3,612,134,270

Partitioning Quality: Planar achieved up to 68% improvement Improv. Max Avg. HP 68% 53% DG 46% 24% LDG 69% 48% 24

Evaluation  Microbenchmarks  Convergence Study (Param Selection)  Partitioning Quality  Real-World Workloads  Breadth First Search (BFS)  Single Source Shortest Path (SSSP)  Scalability Test  Scalability vs Graph Size  Scalability vs # of Partitions  Scalability vs Graph Size and # of Partitions 25

Real-World Workload: Setup PittMPICluster Gordon Cluster Configuration ( FDR Infiniband ) (QDR Infiniband) # of Nodes 32 1024 Single Switch 4*4*4 3D Torus of Switches Network Topology (32 nodes / switch) (16 nodes / switch) Network Bandwidth 56Gbps 8Gbps PittMPICluster Gordon Node Configuration (Intel Haswell) (Intel Sandy Bridge) 2 2 # of Sockets (10 cores / socket) (8 cores / socket) L3 Cache 25MB 20MB Memory Bandwidth 65GB/s 85GB/s 26

Planar: Avoiding Resource Contention on the Memory Subsystems of Multicore Machines System Bottleneck (A. Zheng EDBT’16) PittMPICluster Gordon Memory (λ=1 ) Network (λ=0) Degree of Contention Intra-Node Network Maximal Inter-Node Network Comm Cost Comm Cost 27

Real-World Workload: Baselines Balanced Graph (Re)Partitioning Partitioners Repartitioners (static graphs) (dynamic graphs) Offline Methods Offline Methods Online Methods Online Methods ( High Quality) (High Quality) (Moderate Quality) (Moderate~High Quality) (Poor Scalability ) (Poor Scalability) (High Scalability) (High Scalability) uniPlanar 28 Initial Partitioner: DG

BFS Exec. Time on PittMPICluster ( λ=1 ): Planar achieved up to 9x speedups ★ as-skitter: |V|=1.6M, |E| = 22M ★ 60 Partitions: three 20-core machines 9x 7.5x 5.8x 4.1x 1.48x 1.37x 1x 29

BFS Comm Volume on PittMPICluster (λ=1 ): Planar had the lowest intra-node comm volume ★ as-skitter: |V|=1.6M, |E| = 22M ★ 60 Partitions: three 20-core machines Reduction Intra-Socket Inter-Socket DG 51% 38% METIS 51% 36% PARMETIS 47% 34% uniPLANAR 44% 28% ARAGON 4.3% 0.8% PARAGON 5.2% 2.6% 30

BFS Exec. Time on Gordon ( λ=0 ): Planar achieved up to 3.2x speedups ★ as-skitter: |V|=1.6M, |E| = 22M ★ 48 Partitions: three 16-core machines 3.2x 1.16x 1.21x 1x 1.05x 31

BFS Comm. Volume on Gordon ( λ=0 ): Planar had the lowest inter-node comm volume ★ as-skitter: |V|=1.6M, |E| = 22M ★ 48 Partitions: three 16-core machines 51% 11% 0.1% 25% 32

Conclusions  PLANAR  Architecture-Aware Adaptive Graph Acknowledgments : Repartitioner  Peyman Givi • Communication Heterogeneity  Patrick Pisciuneri  Mark Silvis • Shared Resource Contention  Up to 9x speedups on real-world Funding :  NSF OIA-1028162 workloads.  NSF CBET-1250171  Scaled up to a graph with 3.6B edges. 33

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph - PowerPoint PPT Presentation

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng , Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1 Graph Partitioning Applications of Graph Partitioning Scientific

Planar Subdivision Let G =( V , E ) be an undirected graph. G is planar if it can be embedded

Planar Algebras and Subfactors Tangle Planar algebra Connection with subfactor Subfactor

1-Fan-Bundle-Planar Drawings of Graphs Patrizio Angelini Michael A. Bekos Michael Kaufmann

Order - disorder operators in planar and almost planar graphs (2) Hugo Duminil-Copin, I.H. E.S.

Computational Geometry Lecture 9: Planar point location Computational Geometry Lecture 9: Planar

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

LAGOS 2017 Delta-Wye Transformations and the Efficient Reduction of Almost-Planar Graphs Isidoro

1-Bend RAC Drawings of NIC-Planar Graphs in Quadratic Area Steven Chaplick, Fabian Lipp,

Planar Pixel Sensor Production at CiS Planar Pixel Sensor Production at CiS Anna Macchiolo - MPP

Chapter 12 and 11.1 Planar graphs, regular polyhedra, and graph colorings Prof. Tesler Math

Computational Geometry Lecture 9: Planar point location 1 Computational Geometry Lecture 9:

Chapter 7 Planar graphs In full: 7.17.3 Parts of: 7.4, 7.67.8 Skip: 7.5 Prof. Tesler

What is organic? n USDA says its intended to promote and Making Your Garden Organic enhance

Midland Section ACS Board Meeting February 3, 2020 Agenda Time Topic Presenter 7:00 Call to

Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Po s k Czech Technical

2009 Tobacco Control Update Supplemental Materials National Cancer Advisory Board February 3,

A Three-Layer Planning Architecture for the Autonomous Control of Rehabilitation Therapies Based

Existing Site Proposed Site Plan St. Michaels - Existing Ground Floor St. Michaels - Proposed

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Architectural Methods to Understand Soft Errors/ Process Variations in DSN 2012 Jun YAO Nara