AlphaZ: A System for Design Space Exploration in the Polyhedral - PowerPoint PPT Presentation

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye

Polyhedral Compilation n The Polyhedral Model n Now a well established approach for automatic parallelization n Based on mathematical formalism n Works well for regular/dense computation n Many tools and compilers: n PIPS, PLuTo, MMAlpha, RStream, GRAPHITE(gcc), Polly (LLVM), ... 2

Design Space (still a subset) n Space Time + Tiling: schedule + parallel loops n Primary focus of existing tools n Memory Allocation n Most tools for general purpose processors do not modify the original allocation n Complex interaction with space time n Higher-level Optimizations n Reduction detection n Simplifying Reduction (complexity reduction) 3

AlphaZ n Tool for Exploration n Provides a collection of analyses, transformations, and code generators n Unique Features n Memory Allocation n Reductions n Can be used as a push-button system n e.g., Parallelization à la PLuTo is possible n Not our current focus 4

This Paper: Case Studies n adi.c from PolyBench n Re-considering memory allocation allows the program to be fully tiled n Outperforms PLuTo that only tiles inner loops n UNAfold (RNA folding application) n Complexity reduction from O(n 4 ) to O(n 3 ) n Application of the transformations is fully automatic 5

This Talk: Focus on Memory n Tiling requires more memory n e.g., Smith-Waterman dependence Sequential Tiled 6

ADI-like Computation n Updates 2D grid with outer time loop n PLuTo only tiles inner two dimensions n Due to a memory based dependence n With an extra scalar , becomes tilable in all three dimensions n PolyBench implementation has a bug n It does not correctly implement ADI n ADI is not tilable in all dimensions 7

adi.c: Original Allocation for (t=0; t < tsteps; t++) { � for (t=0; t < tsteps; t++) { � for (i1 = 0; i1 < n; i1++) � for (i1 = 0; i1 < n; i1++) � for (i2 = 0; i2 < n; i2++) � for (i2 = 0; i2 < n; i2++) � X[i1][i2] = foo(X[i1][i2], X[i1][i2-1], …) � X[i1][i2] = foo(X[i1][i2], X[i1][i2-1], …) � … � … � for (i1 = 0; i1 < n; i1++) � for (i1 = 0; i1 < n; i1++) � for (i2 = n-1; i2 >= 1; i2--) � for (i2 = n-1; i2 >= 1; i2--) � X[i1][i2] = bar(X[i1][i2], X[i1][i2-1], …) � X[i1][i2] = bar(X[i1][i2], X[i1][i2-1], … ) � … � … � } � } � � n Not tilable because of the reverse loop n Memory based dependence: (i1,i2 -> i1,i2+1) n Require all dependences to be non-negative 8

adi.c: Original Allocation � � for (i2 = 0; i2 < n; i2++) � S1: X[i1][i2] = foo(X[i1][i2], X[i1][i2-1], …) � … � � for (i2 = n-1; i2 >= 1; i2--) � S2: X[i1][i2] = bar(X[i1][i2], X[i1][i2-1], … ) � … � S1 X[i1] � S2 X[i1] � 9

adi.c: With Extra Memory n Once the two loops are fused: � � n Value of X only needs to be preserved for one for (i2 = 0; i2 < n; i2++) � S1: X[i1][i2] = foo(X[i1][i2], X[i1][i2-1], …) � iteration of i2 … � n We don’t need a full array X’ , just a scalar � for (i2 = 1; i2 < n; i2++) � S2: X’[i1][i2] = bar(X[i1][i2], X[i1][i2-1], … ) � … � X[i1] � X’[i1] � 10

adi.c: Performance Speedup of Optimized Code on Cray XT6m Speedup of Optimized Code on Xeon Speed up compared to original code Speed up compared to original code 24 8 AlphaZ AlphaZ PLuTo PLuTo 20 16 12 4 8 2 4 1 0 0 0 4 8 12 16 20 24 0 1 2 4 8 Number of Threads (Cores) Number of Threads (Cores) n PLuTo does not scale because the outer loop is not tiled 11

UNAfold n UNAfold [Markham and Zuker 2008] n RNA secondary structure prediction algorithm n O(n 3 ) algorithm was known [Lyngso et al. 1999] n too complicated to implement n “good enough” workaround exists n AlphaZ n Systematically transform O(n 4 ) to O(n 3 ) n Most of the process can be automated 12

UNAFold: Optimization n Key: Simplifying Reductions [POPL 2006] n Finds “hidden scans” in reductions n Rare case: compiler can reduce complexity n Almost automatic: n The O(n 4 ) section must be separated n many boundary cases n Require function to be inlined to expose reuse n Transformations to perform the above is available; no manual modification of code 13

UNAfold: Performance Execution Time of UNAfold Log plot of Execution Time 2500 8 Execution Time in Seconds original original 7 y = 4x + b1 Log of Execution Time 2000 simplified simplified 6 1500 5 4 y = 3x + b2 1000 3 2 500 1 0 0 200 400 600 800 1000 1400 2.0 2.2 2.4 2.6 2.8 3.0 3.2 Sequence Length (N) Log of Sequence Length n Complexity reduction is empirically confirmed 14

AlphaZ System Overview n Target Mapping: n Specifies schedule, C Alpha memory allocation, etc. Polyhedral C+OpenMP Transformations Representation Analyses C+CUDA Target Code Gen Mapping C+MPI 15

Human-in-the-Loop n Automatic parallelization—“holy grail” goal n Current automatic tools are restrictive n A strategy that works well is “hard-coded” n difficult to pass domain specific knowledge n Human-in-the-Loop n Provide full control to the user n Help finding new “good” strategies n Guide the transformation with domain specific knowledge 16

Conclusions n There are more strategies worth exploring n some may currently be difficult to automate n Case Studies n adi.c: memory n UNAfold: reductions n AlphaZ: Tool for trying out new ideas 17

Acknowledgements n AlphaZ Developers/Users n Members of Mélange at CSU n Members of CAIRN at IRISA, Rennes n Dave Wonnacott at Haverford University and his students 18

Key: Simplifying Reductions n Simplifying Reductions [POPL 2006] n Finds “hidden scans” in reductions n Rare case: compiler can reduce complexity n Main idea: i ∑ X [ i ] = A [ i ] O(n 2 ) k = 0 n can be written # i = 0 : A [ i ] X [ i ] = $ i > 0 : X [ i − 1] + A [ i ] % O(n) 19

AlphaZ: A System for Design Space Exploration in the Polyhedral - PowerPoint PPT Presentation

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye Polyhedral Compilation n The Polyhedral Model n Now a well established approach for

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

SPACE EXPLORATION CHALLENGES: 1 ARE THEY SHOWSTOPPERS? Richard Heidmann Association Plante

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

The Future of Space Exploration and the Growing Space Economy A Presentation to

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Design Space Methods Creative Design for Engineering students Samuel Huron - 2017 - VIDEO -

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1

Design Exploration and Design Exploration and Experimental Validation of Experimental Validation

Exploration Simulations on Earth ASTech International Conference Space Exploration Paris December

Advance Space Exploration : Mars Science Laboratory/Curiosity J. Douglas McCuistion Director,

9/4/2010 Space Exploration Supported by National Aeronautics and National Aeronautics and Space

Using CREATEs Rapid Ship Design Environment to Perform Design Space Exploration for a Ship

Outline - General concepts of nanoscale particles - BN tubes & MX 2 tubes - Carbon allotropes

ASSESSMENT OF GAS GAP EVALUATION FOR THE IGNALINA NPP RBMK-1500 Juozas Augutis Lithuanian Energy

MTLE-6120: Advanced Electronic Properties of Materials Low-dimensional materials References:

Fatigue tests involve subjecting a part (or specimen) to repeated cyclic stress ( S ) (such as

GRASS GIS Development APIs @ Moritz Lennert, based on a presentation design by Markus Neteler

Java Loops (Java: An Eventful Approach, Ch 7 and 13), 6 November 2012 Slides Credit: Bruce,

Daniel 11:32 but the people who know their God shall stand firm and take action. BECAUSE

Fundamental groups of complements of dual varieties in Grass- mannian Hakata, 2007 September

AlphaZ: A System for Design Space Exploration in the Polyhedral - PowerPoint PPT Presentation

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye Polyhedral Compilation n The Polyhedral Model n Now a well established approach for

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

SPACE EXPLORATION CHALLENGES: 1 ARE THEY SHOWSTOPPERS? Richard Heidmann Association Plante

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

The Future of Space Exploration and the Growing Space Economy A Presentation to

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Design Space Methods Creative Design for Engineering students Samuel Huron - 2017 - VIDEO -

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1

Design Exploration and Design Exploration and Experimental Validation of Experimental Validation

Exploration Simulations on Earth ASTech International Conference Space Exploration Paris December

Advance Space Exploration : Mars Science Laboratory/Curiosity J. Douglas McCuistion Director,

9/4/2010 Space Exploration Supported by National Aeronautics and National Aeronautics and Space

Using CREATEs Rapid Ship Design Environment to Perform Design Space Exploration for a Ship

Outline - General concepts of nanoscale particles - BN tubes &amp; MX 2 tubes - Carbon allotropes

ASSESSMENT OF GAS GAP EVALUATION FOR THE IGNALINA NPP RBMK-1500 Juozas Augutis Lithuanian Energy

MTLE-6120: Advanced Electronic Properties of Materials Low-dimensional materials References:

Fatigue tests involve subjecting a part (or specimen) to repeated cyclic stress ( S ) (such as

GRASS GIS Development APIs @ Moritz Lennert, based on a presentation design by Markus Neteler

Java Loops (Java: An Eventful Approach, Ch 7 and 13), 6 November 2012 Slides Credit: Bruce,

Daniel 11:32 but the people who know their God shall stand firm and take action. BECAUSE

Fundamental groups of complements of dual varieties in Grass- mannian Hakata, 2007 September

Outline - General concepts of nanoscale particles - BN tubes & MX 2 tubes - Carbon allotropes