Dependences for Parallelization Kaushik Rajan Abhishek Udupa - PowerPoint PPT Presentation

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - ⊺ −1 0 1 … … ⊺ ALTER: Exploiting Breakable Dependences for Parallelization Kaushik Rajan Abhishek Udupa William Thies Rigorous Software Engineering Microsoft Research, India

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Parallelization Reconsidered ⊺ −1 0 1 … … ⊺ No DOALL Parallelism Are there dependences between loop iterations? Sequential program Yes

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Parallelization Reconsidered ⊺ −1 0 1 … … ⊺ No DOALL Parallelism Are there dependences between loop iterations? Sequential program Yes Our Technique: SG3D Floyd-Warshall Agglomerative 2.0x speedup Clustering Gauss Seidel K-Means on four cores Break Commutativity Speculation Dependences! Analysis No No Speedup Speedup Dependences Dependences can Dependences can are Imprecise be Reordered be Broken Rigorous Software Engineering Rigorous Software Engineering Microsoft Research, India Microsoft Research, India

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Parallelization Reconsidered ⊺ −1 0 1 … … ⊺ No DOALL Parallelism Are there dependences between loop iterations? Sequential program Yes Our Technique: SG3D Floyd-Warshall Agglomerative 2.0x speedup ALTER Clustering Gauss Seidel K-Means on four cores Break Commutativity Speculation Dependences! Analysis No No Speedup Speedup Dependences Dependences can Dependences can are Imprecise be Reordered be Broken Rigorous Software Engineering Rigorous Software Engineering Microsoft Research, India Microsoft Research, India

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Outline ⊺ −1 0 1 … … ⊺ • Breakable Dependences: Stale Reads • Deterministic Runtime System • Assisted Parallelization • Results *other details in the paper*

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ Breakable Dependences 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - ⊺ −1 0 1 … … in an Iterative Convergence Algorithm ⊺ Examples: while(!converged) { • Floyd Warshall algorithm for i = 1 to n { refine(soln[i]) • Monotonic data-flow analyses } • Linear algebra solvers } • Stencil computations sequential ALTER: s tale reads privatized DO DO DO WHILE WHILE WHILE I (n) I (n) I (n) … … … I (2) I (2) I (2) shared I (1) merge I (1) I (1) memory

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Stale Reads Execution Model ⊺ −1 0 1 … … ⊺ W 1 W 2 3 1 5 7 8 2 4 6 𝑋 1 ∩ 𝑋 2 = * + Stale reads • Execution valid under staleReads model iff – Commit order is some serial order of iterations (can be different from sequential order) – Each iteration reads a stale but consistent snapshot – Staleness is bounded: no intersecting writes by intervening iterations Akin to Snapshot Isolation for databases

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Stale Reads with Reduction ⊺ −1 0 1 … … ⊺ 𝑆 ⊆ 𝑋 𝑆 ⊆ W 1 𝑋 2 , 𝑋 W 1 , 𝑋 2 2 1 3 1 5 7 8 2 4 6 𝑆 ) ∩ (𝑋 𝑆 ) = (𝑋 1 ∖ W 2 ∖ 𝑋 1 2 𝑠𝑓𝑒𝑣𝑑𝑢𝑗𝑝𝑜 𝑆 ≔ 𝑤𝑏𝑠, 𝑃 where 1. Every access to var is an update using operation O 2. Operator O is commutative and associative

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Deterministic Runtime System ⊺ −1 0 1 … … ⊺ state FORK() private private private • body(1) • body(2) • body(3) with RW with RW with RW 1 2 3 EXECUTE() logging logging logging Commit? 2 Commit? 1 3 JOIN() state StaleReads Commit(i): ∀ 𝑘 𝑡𝑢.𝑘<𝑗 𝑥𝑠𝑗𝑢𝑓𝑡 𝑗 ∩ 𝑥𝑠𝑗𝑢𝑓𝑡 𝑘 = *+

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Alter Annotations ⊺ −1 0 1 … … ⊺ while(error < EPSILON) { //convergence loop error = 0.0; for(uint32_t i = 1; i < grid->xmax - 1; ++i) { [ StaleReads, (error, max)] for(uint32_t j = 1; j < grid->ymax - 1; ++j) { for(uin32_t k = 1; k < grid->zmax - 1; ++k) { oldValue = grid[i][j][k] grid[i][j][k] = a * grid[i][j][k] + b * AddDirectNbr(grid) + c * AddSquareNbr(grid) + d * AddCubeNbr(grid); error = max(error, (OldValue,GridPtr[i][j][k]))); } }

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Test Driven Parallelism Inference ⊺ −1 0 1 … … ⊺ Exhaustive parallelization engine • For each annotation run all Sequential test cases, record outcome Test suite program • outcome of a single run 𝑡𝑣𝑑𝑑𝑓𝑡𝑡, 𝑔𝑏𝑗𝑚𝑣𝑠𝑓 ∈ ( crash, Exhaustive timeout, high contention, output parallelization engine mismatch )  Output mismatch: assertion Candidate Parallel failures or floating point program User difference < 0.01% validation

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Assisted Parallelism ⊺ −1 0 1 … … ⊺ ALTER Prior art Assisted parallelism Automatic parallelism Sequential Test suite Sequential program program Exhaustive Conservative parallelization engine Compiler analysis Candidate Parallel program User Parallel validation program Auto tune for perf Preserve program Preserve functionality dependences

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Benchmarks ⊺ −1 0 1 … … ⊺ BENCHMARK ALGORITHM TYPE PARALLELISM LOOP WGT AggloClust Branch & bound STALE READS 89% GSdense Dense algebra STALE READS 100% GSsparse Sparse algebra STALE READS 100% FloydWarshall Dynamic programming STALE READS 100% SG3D Structured grids STALE READS, (error, max) 96% BarnesHut N-body methods DOALL 99.6% FFT Spectral methods DOALL 100% HMM Graphical models DOALL 100% Genome Bioinformatics STALE READS 89% SSCA2 Scientific STALE READS 76% Data mining K-means STALE READS, (delta, +) 89% Engineering Labyrinth _ 99%

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Experimental Setup ⊺ −1 0 1 … … ⊺ • Experiments on a 2 x quad core Xeon processor • Alter transformations in Microsoft Phoenix compiler framework • Comparison with dependence speculation and manual parallelization of 2 applications

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Results : Baseline ⊺ −1 0 1 … … ⊺ 6 5 staleReads 4 OutOfOrder speculate 3 No scope for dependence DOALL 2 speculation 1 No scope for dependence 0 speculation

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Results : Alter ⊺ −1 0 1 … … ⊺ 6 5 staleReads 4 OutOfOrder speculate 3 DOALL 2 1 0

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Results: Manual Parallelization ⊺ −1 0 1 … … ⊺ 6 Good speedup with manual fine grain locking 5 staleReads Comparable 4 performance OutOfOrder 3 speculate DOALL 2 1 0

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - In the Paper… ⊺ −1 0 1 … … ⊺ • ALTER multi-process memory allocator • ALTER collections • Usage scenario’s for ALTER • Profiling and instrumentation overhead • DOALL parallelism and speculation within ALTER

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Related Work ⊺ −1 0 1 … … ⊺ • Test-driven parallelization – QuickStep: similar testing methods for non-deterministic programs, offers accuracy bounds [Rinard 2010] • Assisted parallelization [Taylor 2011] [Tournavitis 2009] – Paralax: annotations improve precision of analysis, but dependences respected [Vandierendonck 2010] • Implicit parallelization [Burckhardt 2010] – Commutative annotation for reordering[August 2007, 11] – Optimistic execution of irregular programs [Pingali 2008] – As far as we know, stale reads execution model is new

𝐼, 𝑓 → 𝐼 ′ , 𝑓′ 𝐼, 𝑢 𝑓 → 𝐼 ′ , 𝑢,𝑓 ′ - Conclusions ⊺ −1 0 1 … … ⊺ • Breakable dependences must be exploited in order to parallelize certain classes of programs • We propose a new execution model, StaleReads , that violates dependences in a principled way • Adopt database notion of Snapshot Isolation for loop parallelization • ALTER is a compiler and deterministic runtime system that discovers new parallelism in programs • We believe tools for assisted parallelism can help to overcome the limits of automatic parallelization

Dependences for Parallelization Kaushik Rajan Abhishek Udupa - PowerPoint PPT Presentation

, , , , , - 1 0 1 ALTER: Exploiting Breakable Dependences for Parallelization Kaushik Rajan Abhishek Udupa William Thies Rigorous Software Engineering

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Prandtl and Rayleigh numbers dependences in Rayleigh-B enard convection P.-E. Roche 1 , 2 , B.

Dependences and Hazards Lecture 17 CS301 Administrative Daily Review of todays lecture

Model dependences, uncertain1es, and combined analysis Intro

Using Predicate Path Information in Hardware to Determine True Dependences Lori Carter and Brad

1 Checking Legality in Kelly & Pugh Framework Loop Fusion Example (cont) For each dependence,

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

Parallelization of Geodesic Ray-Tracing for Arbitrary Metrics Guillermo Andree Oliva Mercado

1/18 Straightforward parallelization of polynomial multiplication using parallel collections in

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Integration of Spark parallelization in TMVA Georgios Douzas Enric Tejedor, Sergei Gleyzer,

Progress in clasp series 3 Martin Gebser Roland Kaminski Benjamin Kaufmann Javier Romero

Computer Graphics on Mobile Devices VL SS2010 3.0 ECTS Peter Rautek Overview Google Code

Mental imagery in Computer Science Alain Finkel, LSV, ENS Cachan & CNRS France ECSS'2009

Praktische Informatik I Der Imperative Kern Rekursive Suche Prof. Dr. Stefan Edelkamp

The SAGA of the french apeNEXTs To fight for decent computing power it sometimes needs

Taking the Journey to the Next Level Featuring Carla Harris Vice Chairman, Global Wealth

Problem Solving and Search Ulle Endriss Institute for Logic, Language and Computation University

environmental storytelling Allison Parrish the story told by the game-world as if the player

Dependences for Parallelization Kaushik Rajan Abhishek Udupa - PowerPoint PPT Presentation

, , , , , - 1 0 1 ALTER: Exploiting Breakable Dependences for Parallelization Kaushik Rajan Abhishek Udupa William Thies Rigorous Software Engineering

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Prandtl and Rayleigh numbers dependences in Rayleigh-B enard convection P.-E. Roche 1 , 2 , B.

Dependences and Hazards Lecture 17 CS301 Administrative Daily Review of todays lecture

Model dependences, uncertain1es, and combined analysis Intro

Using Predicate Path Information in Hardware to Determine True Dependences Lori Carter and Brad

1 Checking Legality in Kelly &amp; Pugh Framework Loop Fusion Example (cont) For each dependence,

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

Parallelization of Geodesic Ray-Tracing for Arbitrary Metrics Guillermo Andree Oliva Mercado

1/18 Straightforward parallelization of polynomial multiplication using parallel collections in

Hybrid Parallelization of the MrBayes &amp; RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Integration of Spark parallelization in TMVA Georgios Douzas Enric Tejedor, Sergei Gleyzer,

Progress in clasp series 3 Martin Gebser Roland Kaminski Benjamin Kaufmann Javier Romero

Computer Graphics on Mobile Devices VL SS2010 3.0 ECTS Peter Rautek Overview Google Code

Mental imagery in Computer Science Alain Finkel, LSV, ENS Cachan &amp; CNRS France ECSS'2009

Praktische Informatik I Der Imperative Kern Rekursive Suche Prof. Dr. Stefan Edelkamp

The SAGA of the french apeNEXTs To fight for decent computing power it sometimes needs

Taking the Journey to the Next Level Featuring Carla Harris Vice Chairman, Global Wealth

Problem Solving and Search Ulle Endriss Institute for Logic, Language and Computation University

environmental storytelling Allison Parrish the story told by the game-world as if the player

1 Checking Legality in Kelly & Pugh Framework Loop Fusion Example (cont) For each dependence,

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Mental imagery in Computer Science Alain Finkel, LSV, ENS Cachan & CNRS France ECSS'2009