Sanjay Rajopadhye Colorado State University
n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro) 2
n Parallel Programming is hard n “End of the free lunch” [Sut05] n Arrival of “manycores” signals the end of “La-Z-Boy Programming” [Pat06] Becoming a parallel programming expert will get you a good job But your skills may become obsolete – new machines, new languages, … Parallelism must return to La-Z-Boy programming [Sut05] Herb Sutter. “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency,” in Software. Dr. Dobb's Journal, vol. 30, no. 3, 2005. [Pat06] David Patterson, in keynote talk at the International Workshop on Languages and Compilers For Parallel Computers LCPC 2006, New Orleans, LA. 3
n Short term Become macho GPU programmer: write “heroically tuned” codes. n Medium term Do it systematically: tuning for GTX 280 vs tuning for GTX 465: learn principles, not skills n Long term Do it automatically: Learn the foundations of automatic compilation. Focus on a “regular subset” of programs n Polyhedral Equational Model 4
n Big picture n Polyhedral Equations as programs: I’m loath to write C, despite the slogan “C no evil” n Equations vs (conventional) loop programs n Equations-to-code (compiling equations) n Schedule n (processor) allocation n (memory) allocation n But what about parallelism? 5
10 assignments (basic + advanced) + term project n CUDA performance tuning (2) n Equational programming: Alpha/AlphaZ (1) n Mathematical foundations: polyhedra, affine functions, and operations (2) n Alpha analysis/transformation (1) n Analysis: scheduling & allocation (2) n Code generation/tiling (2) 6
n Assignments (30%) n Midterm (take home) (30%) n Final project (30% = 2+3+5+15+5) n Proposal n Advancement report n Final report n Quality of work n Final poster n Participation/Discussion/Quizzes (10%) 7
n What are polyhedra? n Why are they useful/important n What is the polyhedral model? 8
n What is a model? n A mathematical/computational/mechanical/ … abstraction of some other (physical) entity n Objects in the model must “emulate” the “natural operations” of the modeled entities – semantics 9
From Feautrier’s keynote at LCPC 2009 Introduction Prehistory State of the Art What Next ? Dependences Irigoin, PF 1988, Pugh, 1992 Karp, Miller, Winograd 1967 Systolic Array Design Scheduling , Quinton, Robert, 1989 Quinton, Rajopadhye, Fortes, PF , Rajopadhye, 1987 Rau Placement PF, Pingali, 1994 H. T. Kung, 1978 Code Generation Irigoin, Lengauer, Rajopadhye Cousot, Halbwachs 1977 The Polytope Model Bastoul, PF, Boulet, 1987 −− 2005 Pugh, 1991 Tiling LC Lu, 1991 , Irigoin, JL Xue, 1988 Array Shrinking PF, Rajopadhye, Darte, 2005 Bernstein 1966 Automatic Parallelization Dependence tests, Banerjee, 1976 Locality Wolfe + Lam, 1991 Kuck L. Lamport, 1974 Allen, Kennedy, 1987 Bastoul, 2003 Lam Irigoin HLS Quinton, Risset, 1996 12 / 39 10
n Physical entity: programs/computations n The Polyhedral Model is a “very high level” intermediate representation (IR) of “regular computations” n Polyhedral equational model: real=abstract n Amenable to: n Mathematical static analysis n Transformation within model: closure n Transformation outside model: (tiled) code generation 11
n Class objectives, goals, introduction n CUDA performance tuning (wrap up) n Equational Programming (intro)? 12
n Many resources on the web (NVIDIA webinars) n Coalescing (HW1a) n Challenge question: Achieve maximum bandwidth, with fewest threads-per-block n For a “strided-by-block” access pattern. n Arithmetic peak: warps and “virtualization” n Bank conflicts in shared memory 13
n MAXPYrep: n Repeatedly execute Y=A*X+Y n Where A, X and Y are matrices n Matrices are small enough to fit in shared memory (ignore global memory access coalescing) n Goal: achieve machine peak n Port all previous performance to GTX 480 n And beyond … n Teach me 14
n Oxford CUDA conf (CUDA webinar online) n “Identifying Performance Limiters,” Micikevicius NVIDIA/UCF (CUDA webinar) n “Roofline for Fast Math” Sam Williams, LBL 15
n Wiki page for Pascal’s Triangle http://en.wikipedia.org/wiki/Pascal's_triangle � n … and also a non-standard way to compute Fibonacci numbers 16
Recommend
More recommend