Sanjay Rajopadhye Colorado State University n Class objectives, - PowerPoint PPT Presentation

Sanjay Rajopadhye Colorado State University

n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2

n Parallel Programming is hard n “End of the free lunch” [Sut05] n Arrival of “manycores” signals the end of “La-Z-Boy Programming” [Pat06] Becoming a parallel programming expert will get you a good job But your skills may become obsolete – new machines, new languages, … Parallelism must return to La-Z-Boy programming [Sut05] Herb Sutter. “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency,” in Software. Dr. Dobb's Journal, vol. 30, no. 3, 2005. [Pat06] David Patterson, in keynote talk at the International Workshop on Languages and Compilers For Parallel Computers LCPC 2006, New Orleans, LA. 3

n Short term Become macho GPU programmer: write “heroically tuned” codes. n Medium term Do it systematically: tuning for GTX 280 vs tuning for GTX 465: learn principles, not skills n Long term Do it automatically: Learn the foundations of automatic compilation. Focus on a “regular subset” of programs n Polyhedral Equational Model 4

n Big picture n Polyhedral Equations as programs: I’m loath to write C, despite the slogan “C no evil” n Equations vs (conventional) loop programs n Equations-to-code (compiling equations) n Schedule n (processor) allocation n (memory) allocation n But what about parallelism? 5

10 assignments (basic + advanced) + term project n CUDA performance tuning (2) n Equational programming: Alpha/AlphaZ (1) n Mathematical foundations: polyhedra, affine functions, and operations (2) n Alpha analysis/transformation (1) n Analysis: scheduling & allocation (2) n Code generation/tiling (2) 6

n Assignments (30%) n Midterm (take home) (30%) n Final project (30% = 2+3+5+15+5) n Proposal n Advancement report n Final report n Quality of work n Final poster n Participation/Discussion/Quizzes (10%) 7

n What are polyhedra? n Why are they useful/important n What is the polyhedral model? 8

n What is a model? n A mathematical/computational/mechanical/ … abstraction of some other (physical) entity n Objects in the model must “emulate” the “natural operations” of the modeled entities – semantics 9

From Feautrier’s keynote at LCPC 2009 Introduction Prehistory State of the Art What Next ? Dependences Irigoin, PF 1988, Pugh, 1992 Karp, Miller, Winograd 1967 Systolic Array Design Scheduling , Quinton, Robert, 1989 Quinton, Rajopadhye, Fortes, PF , Rajopadhye, 1987 Rau Placement PF, Pingali, 1994 H. T. Kung, 1978 Code Generation Irigoin, Lengauer, Rajopadhye Cousot, Halbwachs 1977 The Polytope Model Bastoul, PF, Boulet, 1987 −− 2005 Pugh, 1991 Tiling LC Lu, 1991 , Irigoin, JL Xue, 1988 Array Shrinking PF, Rajopadhye, Darte, 2005 Bernstein 1966 Automatic Parallelization Dependence tests, Banerjee, 1976 Locality Wolfe + Lam, 1991 Kuck L. Lamport, 1974 Allen, Kennedy, 1987 Bastoul, 2003 Lam Irigoin HLS Quinton, Risset, 1996 12 / 39 10

n Physical entity: programs/computations n The Polyhedral Model is a “very high level” intermediate representation (IR) of “regular computations” n Polyhedral equational model: real=abstract n Amenable to: n Mathematical static analysis n Transformation within model: closure n Transformation outside model: (tiled) code generation 11

n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro)? 12

n Many resources on the web (NVIDIA webinars) n Coalescing (HW1a) n Challenge question: Achieve maximum bandwidth, with fewest threads-per-block n For a “strided-by-block” access pattern. n Arithmetic peak: warps and “virtualization” n Bank conflicts in shared memory 13

n MAXPYrep: n Repeatedly execute Y=A*X+Y n Where A, X and Y are matrices n Matrices are small enough to fit in shared memory (ignore global memory access coalescing) n Goal: achieve machine peak n Port all previous performance to GTX 480 n And beyond … n Teach me 14

n Oxford CUDA conf (CUDA webinar online) n “Identifying Performance Limiters,” Micikevicius NVIDIA/UCF (CUDA webinar) n “Roofline for Fast Math” Sam Williams, LBL 15

n Wiki page for Pascal’s Triangle http://en.wikipedia.org/wiki/Pascal's_triangle � n … and also a non-standard way to compute Fibonacci numbers 16

Sanjay Rajopadhye Colorado State University n Class objectives, - PowerPoint PPT Presentation

Sanjay Rajopadhye Colorado State University n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2 n Parallel Programming is hard n End of the free lunch [Sut05] n

Programming for the 0/1 Knapsack Problem Nirmal Prajapati Sanjay Rajopadhye Tarequl Islam Sifat

A library to manipulate Z-polyhedron in image representation Guillaume Iooss, Sanjay Rajopadhye

High-Performance Embedded High-Performance Embedded Systems-on-a-Chip Systems-on-a-Chip Sanjay

High-Performance Embedded High-Performance Embedded Systems-on-a-Chip Systems-on-a-Chip Sanjay

Lecture 17: Scheduling Sanjay Rajopadhye Computer Science, Colorado State University

Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay

Constant Aspect-Ratio Tiling Guillaume Iooss, Sanjay Rajopadhye, Christophe Alias, Yun Zou

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya

Careers in Colorado Thomas Hartman, PhD Colorado Workforce Development Council 1 Colorado

Leapfrog approach for Middle Income class in India Sanjay Vashist Director Climate Action

Eve Gruntfest Eve Gruntfest University of Colorado Colorado Colorado Springs Springs

Optimal ILP and Register Tiling: Analytical Model and Optimization Framework Lakshminarayanan.

Automatic Creation of Tile Size Selection Models Tomofumi Yuki Lakshminarayanan Renganarayanan

Towards Scalable and Efficient FPGA Stencil Accelerators el Deest 1 Nicolas Estibals 1 Tomofumi

Who is the Colorado Nutrient Who is the Colorado Nutrient Who is the Colorado Nutrient Who is

Relief for Colorado Homeowners Colorado Attorney Generals Office Colorado Attorney General's

A Theorem of Ramsey- Ramseys Number A simple instance Of 6 (or more) people, either

CS 543 Lecture 13b Curves, Tesselation/Geometry Shaders & Level of Detail Prof Emmanuel Agu

Announcements No class tomorrow Review this Friday July 5 Midterm #2 next Friday July

A survey on Riordan arrays Donatella Merlini Dipartimento di Sistemi e Informatica Universit` a

Distinct and Complete Integer Partitions George Beck, Wolfram Research This is joint work with

Lecture 19: Linearity and Distortion 1 Matthew Spencer Harvey Mudd College E157 Radio

Properties of Binomial Theorem Exponents Review Factoring Polynomials Review Dividing

Deterministic Random Walks on the Integers Joshua Cooper and Benjamin Doerr and Joel Spencer

Sanjay Rajopadhye Colorado State University n Class objectives, - PowerPoint PPT Presentation

Sanjay Rajopadhye Colorado State University n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2 n Parallel Programming is hard n End of the free lunch [Sut05] n

Programming for the 0/1 Knapsack Problem Nirmal Prajapati Sanjay Rajopadhye Tarequl Islam Sifat

A library to manipulate Z-polyhedron in image representation Guillaume Iooss, Sanjay Rajopadhye

High-Performance Embedded High-Performance Embedded Systems-on-a-Chip Systems-on-a-Chip Sanjay

High-Performance Embedded High-Performance Embedded Systems-on-a-Chip Systems-on-a-Chip Sanjay

Lecture 17: Scheduling Sanjay Rajopadhye Computer Science, Colorado State University

Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay

Constant Aspect-Ratio Tiling Guillaume Iooss, Sanjay Rajopadhye, Christophe Alias, Yun Zou

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya

Careers in Colorado Thomas Hartman, PhD Colorado Workforce Development Council 1 Colorado

Leapfrog approach for Middle Income class in India Sanjay Vashist Director Climate Action

Eve Gruntfest Eve Gruntfest University of Colorado Colorado Colorado Springs Springs

Optimal ILP and Register Tiling: Analytical Model and Optimization Framework Lakshminarayanan.

Automatic Creation of Tile Size Selection Models Tomofumi Yuki Lakshminarayanan Renganarayanan

Towards Scalable and Efficient FPGA Stencil Accelerators el Deest 1 Nicolas Estibals 1 Tomofumi

Who is the Colorado Nutrient Who is the Colorado Nutrient Who is the Colorado Nutrient Who is

Relief for Colorado Homeowners Colorado Attorney Generals Office Colorado Attorney General's

A Theorem of Ramsey- Ramseys Number A simple instance Of 6 (or more) people, either

CS 543 Lecture 13b Curves, Tesselation/Geometry Shaders &amp; Level of Detail Prof Emmanuel Agu

Announcements No class tomorrow Review this Friday July 5 Midterm #2 next Friday July

A survey on Riordan arrays Donatella Merlini Dipartimento di Sistemi e Informatica Universit` a

Distinct and Complete Integer Partitions George Beck, Wolfram Research This is joint work with

Lecture 19: Linearity and Distortion 1 Matthew Spencer Harvey Mudd College E157 Radio

Properties of Binomial Theorem Exponents Review Factoring Polynomials Review Dividing

Deterministic Random Walks on the Integers Joshua Cooper and Benjamin Doerr and Joel Spencer

CS 543 Lecture 13b Curves, Tesselation/Geometry Shaders & Level of Detail Prof Emmanuel Agu