TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis - PowerPoint PPT Presentation

TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis George D Magoulas

Two Major Approaches to GPU Acceleration of GP Data parallel Compile new GPU code for each new batch Population parallel Write one GPU interpreter to process all batches

The Aim of the Work: To Minimise the Weakness of Data-parallel Data parallel Evaluation: very fast Compilation: long Population parallel Evaluation: fast Compilation: none

The Problem: Compilation Stops Small Datasets Getting Top Speed

Two Strategies to Ease Load for Compiler; This Talk is about the First 1. PTX Write the individuals in a lower level language 2. Alignment Exploit similarities between individuals

Compilation Creates a GPU-ready Binary from C Source Code

Compilation Uses Two Slow Steps; This Work Eliminates the First

PTX is a Bit Like Assembly PTX Example C Example mov.f32 %slot0, 0fBFD20CD6; slot0 = -1.64101672f; add.f32 %slot4, %slot4, %slot3; slot4 += slot3; sub.f32 %slot1, %slot1, %testcase0; slot1 -= testcase0; mul.f32 %slot0, %slot0, %slot3; slot0 *= slot3; div.full.f32 %slot2, %slot2, %slot3; slot2 = ( setp.eq.f32 %divPred, %slot3, 0f00000000; (slot3 == 0.0f) ? selp.f32 %slot2, 0f00000000, %slot2, %divPred; 0.0f : slot2/slot3 );

Take a Step Back: What is the Reason For Doing This Work?

Take a Step Back: What is the Reason For Doing This Work? Long Term Fitness Growth

Thought Experiment:

Thought Experiment: Toy Blocks

Thought Experiment: A Tower of Blocks

The Same Problem Is Faced by a GP Tree

How Can We Encourage Long Term Fitness Growth?

How Can We Encourage Long Term Fitness Growth? Encourage tweaks: Mutations that can easily change behaviour without ruining existing functionality

A Representation to Encourage Tweaks Linear form not node-based Registers not stack Iterated execution not point of execution Instructions that modify not overwrite Long programs

The Result: TMBL Tweaking a Tower of Blocks Leads to a TMBL: Pursuing Long Term Fitness Growth in Program Evolution Tony E Lewis,George D Magoulas 2010, IEEE Congress on Evolutionary Computation (CEC) (pages 4465-4472) takesatmbl.wordpress.com

PTX is a Bit Like Assembly PTX Example C Example mov.f32 %slot0, 0fBFD20CD6; slot0 = -1.64101672f; add.f32 %slot4, %slot4, %slot3; slot4 += slot3; sub.f32 %slot1, %slot1, %testcase0; slot1 -= testcase0; mul.f32 %slot0, %slot0, %slot3; slot0 *= slot3; div.full.f32 %slot2, %slot2, %slot3; slot2 = ( setp.eq.f32 %divPred, %slot3, 0f00000000; (slot3 == 0.0f) ? selp.f32 %slot2, 0f00000000, %slot2, %divPred; 0.0f : slot2/slot3 );

...but PTX isn't Exactly Like Assembly Doesn't directly correspond with resulting binary Eg. Many registers get compiled to few

Will PTX Code Evaluate Slower? Maybe Yes: Competing with the CUDA compiler's developers Maybe No: We know our code better than the compiler does: Can guarantee non-divergent branches Can use non-divergent instructions ( a=b?c:d )

Results: Load time is small

Results: Evaluation Speed is Improved

Results: Compile Time is Considerably Reduced (~5.8x)

Conclusions Complexity Maintainability Effectiveness Possibility of going further

Thanks EPSRC Reviewers You

TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis - PowerPoint PPT Presentation

TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis George D Magoulas Two Major Approaches to GPU Acceleration of GP Data parallel Compile new GPU code for each new batch Population parallel Write one GPU interpreter to process

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

A Case Study Using Modern C++ Libraries Frameworks Cuda NVIDIA Large set of libraries

A Characterization and Analysis of PTX Kernels Andrew Kerr*, Gregory Diamos, and Sudhakar

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Automated Creation of Tests from CUDA Kernels Oleg Rasskazov, Andrey Zhezherun, Antti Lamberg (JP

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Navigating the Navigating the COVID COVID-19 Crisis 19 Crisis Erin Maguire Erin Maguire, CASE

Results for Q2 Fiscal 2019 Earnings Announcement: October 25, 2018 (Quarter Ended September 28,

Walpole & CEC roundtable Claudia Harris, CEO & Aimee Higgins, Director of Stakeholders

Jason Baumgartner, Hari Mony, Michael Case, Jun Sawada and Karen Yorav IBM Corporation FMCAD

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

OVERVIEW Level 1 Bullet AAPCHO Civic Engagement Level 1 Bullet Coordinators Call

Compact Routing Schemes Mikkel Thorup, Uri Zwick Ankit Singla CS 598 Oct. 20, 2009 Ankit

FOURTH QUARTER FISCAL YEAR 2019 FINANCIAL RESULTS May 9, 2019 CAUTIONARY STATEMENT UNDER THE

TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis - PowerPoint PPT Presentation

TMBL Kernels for CUDA GPUs Compile Faster Using PTX Tony E Lewis George D Magoulas Two Major Approaches to GPU Acceleration of GP Data parallel Compile new GPU code for each new batch Population parallel Write one GPU interpreter to process

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

A Case Study Using Modern C++ Libraries Frameworks Cuda NVIDIA Large set of libraries

A Characterization and Analysis of PTX Kernels Andrew Kerr*, Gregory Diamos, and Sudhakar

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Automated Creation of Tests from CUDA Kernels Oleg Rasskazov, Andrey Zhezherun, Antti Lamberg (JP

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Navigating the Navigating the COVID COVID-19 Crisis 19 Crisis Erin Maguire Erin Maguire, CASE

Results for Q2 Fiscal 2019 Earnings Announcement: October 25, 2018 (Quarter Ended September 28,

Walpole &amp; CEC roundtable Claudia Harris, CEO &amp; Aimee Higgins, Director of Stakeholders

Jason Baumgartner, Hari Mony, Michael Case, Jun Sawada and Karen Yorav IBM Corporation FMCAD

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

OVERVIEW Level 1 Bullet AAPCHO Civic Engagement Level 1 Bullet Coordinators Call

Compact Routing Schemes Mikkel Thorup, Uri Zwick Ankit Singla CS 598 Oct. 20, 2009 Ankit

FOURTH QUARTER FISCAL YEAR 2019 FINANCIAL RESULTS May 9, 2019 CAUTIONARY STATEMENT UNDER THE

Walpole & CEC roundtable Claudia Harris, CEO & Aimee Higgins, Director of Stakeholders