Automatic Creation of Tile Size Selection Models Tomofumi Yuki - PowerPoint PPT Presentation

Automatic Creation of Tile Size Selection Models Tomofumi Yuki Lakshminarayanan Renganarayanan Sanjay Rajopadhye Charles Anderson Alexandre Eichenberger Kevin O'Brien Colorado State University IBM Research

Tile Size Selection Problem ● Tiling is an optimization with a parameter “tile size” ● Finding good tile sizes is essential to benefit from tiling ● Good tile sizes can be different for each hardware/application 2

Problems ● Several factors influence performance of tiled code ● Hardware and software keep changing ● Analytical Models (existing approach): ● Require expert knowledge and significant time ● Auto Tuning/Iterative Compilation: ● Long compilation time Can we automate TSS model development? 3

Problems ● Several factors influence performance of tiled code ● Hardware and software keep changing ● Analytical Models (existing approach): ● Require expert knowledge and significant time ● Auto Tuning/Iterative Compilation: ● Long compilation time Can we automate TSS model development? YES we use ML to automate this process 4

Outline ● Background ● Tiling ● Performance considerations for tiled codes ● Neural Networks ● Approach ● Performance Evaluation ● Conclusions and Future Work 5

Tiling tiled loop original loop for (ti=0; ti <= 8; ti+=3) for (i=0; i<=8; i++) for (tj=0; tj <= 8; tj+=3) for (j=0; j<=8; j++) for (i=ti; i < ti+3; i++) for (j=tj; j < tj+3; j++) 6

Tiling tiled loop original loop for (ti=0; ti <= 8; ti+=3) for (i=0; i<=8; i++) for (tj=0; tj <= 8; tj+=3) for (j=0; j<=8; j++) for (i=ti; i < ti+3; i++) for (j=tj; j < tj+3; j++) 7

Tiling for Locality M -Array M is indexed by j Untiled: 9 locations accessed before next i Tiled: 3 locations accessed before next i =>Better reuse if cache cannot store 9 elements 8

Performance Considerations ● Different Types of Cache Misses ● Cold Miss – Unavoidable cost when data is first read into cache ● Capacity Miss – Evicted from cache before reuse due to capacity – LRU eviction is assumed ● Conflict Miss – Evicted from cache before reuse due to conflicts – Self conflict and cross conflict 9

Hardware Prefetching ● Hardware to detect access patterns and load data ahead of time ● Large impact on performance of tiled code 10

Hardware Prefetching ● Hardware to detect access patterns and load data ahead of time ● Large impact on performance of tiled code Unit-Stride prefetching : next = prev + 1 1 2 3 4 11

Neural Networks Important Characteristics -Supervised Learning: Requires input and desired output for training -Using neural networks is fast (matrix-vector product) -Many parameters (number of nodes, layers, and so on) 12

Outline ● Background ● Approach ● Class of Programs ● TSS Model Structure ● Data Collection ● Training ● Use of the Model ● Performance Evaluation ● Conclusions and Future Work 13

Class of Programs ● Affine Control Loops ● Tiled code generators are available ● Many programs that benefit from tiling fit ● Constraint on Tiling ● One-level tiling for cache locality ● Cubic tile sizes – To limit data collection time ● 2D data, 3D loops – 4D+ loops are handled by tiling innermost 3 14

TSS Model Structure ● Input: Program Features ● High-level characterization of reuse ● Total of 6 features – Based on number of references in the statement (1) Prefetched (2) Non-Prefetched (3) Invariant  Each type is further separated by Read/Write ● Output: Optimal Tile Size 15

Overview of Our Approach 1.Data Collection 2.Learning TSS Models Using NN • One model for each architecture/compiler 3.Use of the Model During Compilation • Extract program features • Compute NN output Only step 3 is performed during compilation 16

Data Collection ● Use of Synthetic Programs ● Select a range of program features ● Generate programs that has the required feature ● Run the programs to find optimal tile sizes ● Advantages ● Comprehensive and rich training data set – Uniform coverage – Avoid multiple programs with same features – Easy to get a large set of training data 17

Model Learning and Use ● Model Learning ● Neural network parameters are manually tuned – Only step in model creation that is not automated – After designing a general structure, small tuning was required for different architecture ● Use ● Feature extraction is straight forward ● Computing NN output is instantaneous ● Use of the model is inexpensive 18

Performance Evaluation ● Evaluated by comparing the performance of predicted tiles and the actual optimal ● Trained separate models for each architecture- compiler combination ● 3 architectures, 2 compilers each Architecture Compilers L1 Cache HW Prefetcher Opteron PSC, GCC 64KB 2-way unit-stride Power5 XLC, GCC 32KB 4-way unit-stride Core2Duo ICC, GCC 32KB 8-way constant-stride 19

Results Execution time using trained models, normalized to the true optimal Opteron/PSC Opteron/GCC Power5/XLC Power5/GCC Core2Duo/ICC Core2Duo/GCC 1.4 1.2 1 Normalized Execution Time 0.8 0.6 0.4 0.2 0 MMM TMM SSYRK SSY2K STRMM STRSM LUD SSYMM TRISOLV -No worse than 20% slower compared to the true optimal -Consistent across all architecture-compiler combinations 20

Performance of LRW Execution time using LRW, normalized to the true optimal Opteron/PSC Opteron/GCC Power5/XLC Power5/GCC Core2Duo/ICC Core2Duo/GCC 7 6 Normalized Execution Time 5 4 3 2 1 0 MMM TMM SSYRK SSY2K STRMM STRSM LUD SSYMM TRISOLV -Analytical model that predicts square tiles [LRW] -Tailored to take HW prefetching into account [LRW] M.D. Lam, E.E. Rothberg, and M.E. Wolf. 1991 21

Conclusions & Future Work ● Conclusions Reasonably accurate TSS models can be automatically constructed with “ Semantic Features + Synthetic Programs + NN ” ● Implemented in the IBM XLC ● Future Work ● Extending class of programs ● Automatic NN parameter tuning ● Extract insight from the model 22

Questions? 23

Automatic Creation of Tile Size Selection Models Tomofumi Yuki - PowerPoint PPT Presentation

Automatic Creation of Tile Size Selection Models Tomofumi Yuki Lakshminarayanan Renganarayanan Sanjay Rajopadhye Charles Anderson Alexandre Eichenberger Kevin O'Brien Colorado State University IBM Research Tile Size Selection Problem

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

Creation of new mark Creation of new markets ets Creation of new mark Creation of new markets

Automatic Creation of Search Heuristics Stefan Edelkamp 1 Overview - Automatic Creation of

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

RED LAKE RIVER FARM TO RED LAKE RIVER FARM TO STREAM TILE DRAINAGE STREAM TILE DRAINAGE STUDY

Corporate Presentation Oct 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation May 2019 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation February 2020 Agenda Global Tile Industry Indian Tile Industry Kajaria

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Lecture 6 Firms and Markets in the Performing Arts Nonprofits and For-Profits Professor

Innovation in Pediatric Healthcare Delivery Utah Regional Healthcare Innovation Day April 27,

Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,

Learning unknown forces in nonlinear models with Gaussian processes and autoregressive flows Wil

An Introduction to Reverse Mathematics Noah A. Hughes Appalachian State University Boone, NC

The Logic of can try to prove valid Propositions formulas symbolically using axioms and

Subjective Expected Utility Tommaso Denti March 8, 2015 We will go over Savages subjective

sss t