Pipelined Multithreading Generation in a Polyhedral Compiler - PowerPoint PPT Presentation

Pipelined Multithreading Generation in a Polyhedral Compiler January 22nd 2020, IMPACT’20, HiPEAC, Bologna, Italy Harenome Ranaivoarivony-Razanajato, Cédric Bastoul, Vincent Loechner University of Surasbourg and Inria Nancy Grand Est Team ICPS | Scientifjc and Parallel Computing University of Surasbourg

S1 S2 … S6 (b) Dependency Graph Motivating Example 1 for ( int i = 1; i < N; ++i) 2 A[i] = f1(A[i], A[i - 1]); // S1 3 for ( int i = 1; i < N; ++i) 4 B[i] = f2(A[i], B[i - 1]); // S2 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

Motivating Example S1 1 for ( int i = 1; i < N; ++i) 2 A[i] = f1(A[i], A[i - 1]); // S1 S2 3 for ( int i = 1; i < N; ++i) 4 B[i] = f2(A[i], B[i - 1]); // S2 /* ... */ … 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 S6 (a) Sequential Program (b) Dependency Graph Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

S2(1), thread 1 S3(1), thread 1 S1(2), thread 2 S2(2), thread 2 S1(3), thread 3 Motivating Example S1(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

S3(1), thread 1 S1(2), thread 2 S2(2), thread 2 S1(3), thread 3 Motivating Example S1(1), thread 1 S2(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

S3(1), thread 1 S2(2), thread 2 S1(3), thread 3 Motivating Example S1(1), thread 1 S2(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 S1(2), thread 2 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

S2(2), thread 2 S1(3), thread 3 Motivating Example S1(1), thread 1 S2(1), thread 1 S3(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 S1(2), thread 2 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

S1(3), thread 3 Motivating Example S1(1), thread 1 S2(1), thread 1 S3(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 S1(2), thread 2 S2(2), thread 2 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

Motivating Example S1(1), thread 1 S2(1), thread 1 S3(1), thread 1 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 2 for ( int i = 1; i < N; ++i) 3 B[i] = f2(A[i], B[i - 1]); // S2 4 S1(2), thread 2 S2(2), thread 2 /* ... */ 5 for ( int i = 1; i < N; ++i) 6 F[i] = f6(E[i], F[i - 1]); // S6 7 S1(3), thread 3 (a) Sequential Program (b) Pipelined Execution Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

Speedup: 2.89 6 stages on an Intel Xeon E5-2620v3 @ 2.40 GHz, with N 100 000 Motivating Example #pragma omp parallel 1 { 2 #pragma omp for schedule(static) ordered nowait 3 for ( int i = 1; i < N; ++i) 4 #pragma omp ordered 5 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 6 2 A[i] = f1(A[i], A[i - 1]); // S1 #pragma omp for schedule(static) ordered nowait 7 for ( int i = 1; i < N; ++i) 8 3 for ( int i = 1; i < N; ++i) #pragma omp ordered 9 4 B[i] = f2(A[i], B[i - 1]); // S2 B[i] = f2(A[i], B[i - 1]); // S2 10 5 /* ... */ /* ... */ 11 6 for ( int i = 1; i < N; ++i) #pragma omp for schedule(static) ordered nowait 12 7 F[i] = f6(E[i], F[i - 1]); // S6 for ( int i = 1; i < N; ++i) 13 #pragma omp ordered 14 F[i] = f6(E[i], F[i - 1]); // S6 15 } 16 (a) Sequential Program (b) Pipelined OpenMP target program Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

Motivating Example #pragma omp parallel 1 { 2 #pragma omp for schedule(static) ordered nowait 3 for ( int i = 1; i < N; ++i) 4 #pragma omp ordered 5 1 for ( int i = 1; i < N; ++i) A[i] = f1(A[i], A[i - 1]); // S1 6 2 A[i] = f1(A[i], A[i - 1]); // S1 #pragma omp for schedule(static) ordered nowait 7 for ( int i = 1; i < N; ++i) 8 3 for ( int i = 1; i < N; ++i) #pragma omp ordered 9 4 B[i] = f2(A[i], B[i - 1]); // S2 B[i] = f2(A[i], B[i - 1]); // S2 10 5 /* ... */ /* ... */ 11 6 for ( int i = 1; i < N; ++i) #pragma omp for schedule(static) ordered nowait 12 7 F[i] = f6(E[i], F[i - 1]); // S6 for ( int i = 1; i < N; ++i) 13 #pragma omp ordered 14 F[i] = f6(E[i], F[i - 1]); // S6 15 } 16 (a) Sequential Program (b) Pipelined OpenMP target program Speedup: 2.89 6 stages on an Intel Xeon E5-2620v3 @ 2.40 GHz, with N = 100 , 000 Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 1

Goals • Identifying software pipelines in a polyhedral compiler • Generate pipelined multithreading using OpenMP Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 2

Polyhedral Model Introduction Background Pipelined Multithreading Generation Experimental Results Conclusion

• Synchronization • #pragma omp barrier : explicit synchronization barrier • omp_set_lock() and omp_unset_lock() : explicit lock mechanism • Clauses • nowait clause on worksharing constructs: omit the implicit barrier at the end of a worksharing construct • ordered clause on worksharing constructs: sequentialize a region OpenMP • #pragma based API for shared memory parallelism • Worksharing constructs • #pragma omp for • #pragma omp task Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 3

• Clauses • nowait clause on worksharing constructs: omit the implicit barrier at the end of a worksharing construct • ordered clause on worksharing constructs: sequentialize a region OpenMP • #pragma based API for shared memory parallelism • Worksharing constructs • #pragma omp for • #pragma omp task • Synchronization • #pragma omp barrier : explicit synchronization barrier • omp_set_lock() and omp_unset_lock() : explicit lock mechanism Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 3

OpenMP • #pragma based API for shared memory parallelism • Worksharing constructs • #pragma omp for • #pragma omp task • Synchronization • #pragma omp barrier : explicit synchronization barrier • omp_set_lock() and omp_unset_lock() : explicit lock mechanism • Clauses • nowait clause on worksharing constructs: omit the implicit barrier at the end of a worksharing construct • ordered clause on worksharing constructs: sequentialize a region Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 3

Polyhedral Model Introduction Background Pipelined Multithreading Generation Sequential Loop Fission Relaxed nowait prerequisites Alternative: Explicit synchronization Experimental Results Conclusion

Sequential Loop Fission • Goal: maximize the number of pipeline stages • Dependence analysis: identify Surongly Connected Components Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 4

Sequential Loop Fission for ( int i = 2; i < N; ++i) { 1 a[i] = h[i - 1] + R[i]; // S1 2 for ( int i = 2; i < N; ++i) { b[i] = a[i - 1] + a[i]; // S2 1 3 a[i] = h[i - 1] + R[i]; // S1 c[i] = b[i - 1] + b[i]; // S3 2 4 b[i] = a[i - 1] + a[i]; // S2 d[i] = c[i - 1] + c[i]; // S4 3 5 c[i] = b[i - 1] + b[i]; // S3 e[i] = d[i - 2] + d[i - 1]; // S5 4 6 d[i] = c[i - 1] + c[i]; // S4 f[i] = e[i - 2] + e[i - 1]; // S6 5 7 e[i] = d[i - 2] + d[i - 1]; // S5 g[i] = f[i] + X[i]; // S7 6 8 f[i] = e[i - 2] + e[i - 1]; // S6 h[i] = g[i] + Y[i]; // S8 7 9 g[i] = f[i] + X[i]; // S7 } 8 10 h[i] = g[i] + Y[i]; // S8 for ( int i = 2; i < N; ++i) { 9 11 u[i] = v[i - 1] + d[i]; // S9 10 u[i] = v[i - 1] + d[i]; // S9 12 v[i] = u[i] + Z[i]; // S10 11 v[i] = u[i] + Z[i]; // S10 13 } 12 } 14 (b) Fission of Surongly Connected (a) Original loop body Components Pipelined Multithreading Generation in a Polyhedral Compiler ,Harenome Razanajato et al. 5

Pipelined Multithreading Generation in a Polyhedral Compiler - PowerPoint PPT Presentation

Pipelined Multithreading Generation in a Polyhedral Compiler January 22nd 2020, IMPACT20, HiPEAC, Bologna, Italy Harenome Ranaivoarivony-Razanajato, Cdric Bastoul, Vincent Loechner University of Surasbourg and Inria Nancy Grand Est Team

MULTITHREADING ON IOS AGENDA Multithreading Basics Interlude: Closures Multithreading on iOS

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Simultaneous Multithreading: Simultaneous Multithreading: Multiplying Alpha Performance

Multithreading Recursion Checkout Multithreading and Recursion project from SVN Joe Armstrong,

Multithreading Checkout Multithreading project from SVN Joe Armstrong, Programming in

Multithreading Basics thread state: runnable, blocked Multithreading start, sleep,

Multithreading Horstmann ch.9 Multithreading Threads Thread states Thread

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Verification Games Making Verification Fun Werner Dietl Stephanie Dietzel, Michael D. Ernst,

CS 744: PIPEDREAM Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - Assignment 2 is due Oct

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

sts Prr rtt

Signals and Inter-Process Paper reading assigned for next class Communication Don Porter 1

Tuned Pipes: End-to-end Throughput and Delay Guarantees for USB Devices Ahmad Golchin, Zhuoqun

Introduction Usually, the OS does everything to hide Inter Process Communication

GPCO 453: Quantitative Methods I Sec 10: Hypothesis Testing, III Shane Xinyang Xuan 1

Pipelined Multithreading Generation in a Polyhedral Compiler - PowerPoint PPT Presentation

Pipelined Multithreading Generation in a Polyhedral Compiler January 22nd 2020, IMPACT20, HiPEAC, Bologna, Italy Harenome Ranaivoarivony-Razanajato, Cdric Bastoul, Vincent Loechner University of Surasbourg and Inria Nancy Grand Est Team

MULTITHREADING ON IOS AGENDA Multithreading Basics Interlude: Closures Multithreading on iOS

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Simultaneous Multithreading: Simultaneous Multithreading: Multiplying Alpha Performance

Multithreading Recursion Checkout Multithreading and Recursion project from SVN Joe Armstrong,

Multithreading Checkout Multithreading project from SVN Joe Armstrong, Programming in

Multithreading Basics thread state: runnable, blocked Multithreading start, sleep,

Multithreading Horstmann ch.9 Multithreading Threads Thread states Thread

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &amp;

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Verification Games Making Verification Fun Werner Dietl Stephanie Dietzel, Michael D. Ernst,

CS 744: PIPEDREAM Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - Assignment 2 is due Oct

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

sts Prr rtt

Signals and Inter-Process Paper reading assigned for next class Communication Don Porter 1

Tuned Pipes: End-to-end Throughput and Delay Guarantees for USB Devices Ahmad Golchin, Zhuoqun

Introduction Usually, the OS does everything to hide Inter Process Communication

GPCO 453: Quantitative Methods I Sec 10: Hypothesis Testing, III Shane Xinyang Xuan 1

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &