GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview ¨ Announcement ¤ Homework 6 will be available tonight (due on 04/18) ¨ This lecture ¤ Classification of parallel computers ¤ Graphics processing ¤ GPU architecture ¤ CUDA programming model

Flynn’s Taxonomy ¨ Data vs. instruction streams Instruction Stream Single Multiple Single-Instruction, Multiple-Instruction, Single Single Data (SISD) Single Data (MISD) Data Stream uniprocessors systolic arrays Multiple-Instruction, Single-Instruction, Multiple Data Multiple Multiple Data (SIMD) (MIMD) vector processors multicores

Graphics Processing Unit ¨ Initially developed as graphics accelerator ¤ It receives geometry information from the CPU as an input and provides a picture as an output Graphics Processing Unit (GPU) host memory Vertex Triangle Pixel interface Processing Setup Processing interface

Host Interface ¨ The host interface is the communication bridge between the CPU and the GPU ¨ It receives commands from the CPU and also pulls geometry information from system memory ¨ It outputs a stream of vertices in object space with all their associated information

Vertex Processing ¨ The vertex processing stage receives vertices from the host interface in object space and outputs them in screen space ¨ This may be a simple linear transformation, or a complex operation involving morphing effects

Pixel Processing ¨ Rasterize triangles to pixels ¨ Each fragment provided by triangle setup is fed into fragment processing as a set of attributes (position, normal, texcoord etc), which are used to compute the final color for this pixel ¨ The computations taking place here include texture mapping and math operations

Programming GPUs ¨ The programmer can write programs that are executed for every vertex as well as for every fragment ¨ This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications host memory Vertex Pixel Triangle interface Processing Setup Processing interface

Memory Interface ¨ Fragment colors provided by the previous stage are written to the framebuffer ¨ Used to be the biggest bottleneck before fragment processing took over ¨ Before the final write occurs, some fragments are rejected by the zbuffer, stencil and alpha tests ¨ On modern GPUs, z and color are compressed to reduce framebuffer bandwidth (but not size)

Z-Buffer ¨ Example of 3 objects

Graphics Processing Unit ¨ Initially developed as graphics accelerators ¤ one of the densest compute engines available now ¨ Many efforts to run non-graphics workloads on GPUs ¤ general-purpose GPUs (GPGPUs) ¨ C/C++ based programming platforms ¤ CUDA from NVidia and OpenCL from an industry consortium ¨ A heterogeneous system ¤ a regular host CPU ¤ a GPU that handles CUDA (may be on the same CPU chip)

Graphics Processing Unit ¨ Simple in-order pipelines that rely on thread-level parallelism to hide long latencies ¨ Many registers (~1K) per in-order pipeline (lane) to support many active warps ALU ALU Control ALU ALU Cache DRAM DRAM

The GPU Architecture ¨ SIMT – single instruction, multiple threads ¤ GPU has many SIMT cores ¨ Application à many thread blocks (1 per SIMT core) ¨ Thread block à many warps (1 warp per SIMT core) ¨ Warp à many in-order pipelines (SIMD lanes)

Why GPU Computing? Source: NVIDIA

GPU Computing ¨ GPU as an accelerator in scientific applications

GPU Computing ¨ Low latency or high throughput?

GPU Computing ¨ Low latency or high throughput

CUDA Programming Model ¨ Step 1: substitute library calls with equivalent CUDA library calls ¤ saxpy ( … ) à cublasSaxpy ( … ) n single precision alpha x plus y ( z = α x + y ) ¨ Step 2: manage data locality ¤ cudaMalloc(), cudaMemcpy(), etc. ¨ Step 3: transfer data between CPU and GPU ¤ get and set functions ¨ rebuild and link the CUDA-accelerated library ¤ nvcc myobj.o –l cublas

Example: SAXPY Code int N = 1 << 20; // Perform SAXPY on 1M elements: y[]=a*x[]+y[] saxpy( N , 2.0, x, 1, y, 1);

Example: CUDA Lib Calls int N = 1 << 20; // Perform SAXPY on 1M elements: d_y[]=a*d_x[]+d_y[] cublasSaxpy( N , 2.0, d_x, 1, d_y, 1);

Example: Initialize CUDA Lib int N = 1 << 20; cublasInit(); // Perform SAXPY on 1M elements: d_y[]=a*d_x[]+d_y[] cublasSaxpy( N , 2.0, d_x, 1, d_y, 1); cublasShutdown();

Example: Allocate Memory int N = 1 << 20; cublasInit(); cublasAlloc(N, sizeof(float), (void**)&d_x); cublasAlloc(N, sizeof(float), (void*)&d_y); // Perform SAXPY on 1M elements: d_y[]=a*d_x[]+d_y[] cublasSaxpy( N , 2.0, d_x, 1, d_y, 1); cublasFree(d_x); cublasFree(d_y); cublasShutdown();

Example: Transfer Data int N = 1 << 20; cublasInit(); cublasAlloc(N, sizeof(float), (void**)&d_x); cublasAlloc(N, sizeof(float), (void*)&d_y); cublasSetVector(N, sizeof(x[0]), x, 1, d_x, 1); cublasSetVector(N, sizeof(y[0]), y, 1, d_y, 1); // Perform SAXPY on 1M elements: d_y[]=a*d_x[]+d_y[] cublasSaxpy( N , 2.0, d_x, 1, d_y, 1); cublasGetVector(N, sizeof(y[0]), d_y, 1, y, 1); cublasFree(d_x); cublasFree(d_y); cublasShutdown();

Compiling CUDA ¨ Call nvcc C/C++ CUDA Application ¨ Parallel Threads eXecution (PTX) NVCC CPU Code ¤ Virtual machine and ISA ¨ Two stage PTX Code ¤ 1. PTX PTX to Target ¤ 2. device-specific binary Compiler object G80 … GPU Target code

Memory Hierarchy ¨ Throughput-oriented main memory Thread ¤ Graphics DDR (GDDR) n Wide channels: 256 bit Shared Read only n Lower clock rate than DDR L1 cache memory data cache ¤ 1.5MB shared L2 ¤ 48KB read-only data cache L2 cache n Compiler controlled ¤ Wide buses DRAM

GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 will be available tonight (due on 04/18) This lecture

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Computer Graphics Overview CMSC 435/634 1 Graphics Areas Core graphics areas

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Graphics Computer Graphics vs. Graphic Design Computer Graphics is not using Photoshop-

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Images CS418 Computer Graphics John C. Hart Vector v. Raster Graphics Vector Graphics Raster

FreeBSD graphics Niclas Zeising zeising@FreeBSD.org agenda team the graphics stack

FreeBSD graphics Niclas Zeising zeising@FreeBSD.org agenda team the graphics stack

6. Graphics MULTIMEDIA & GRAPHICS Graphics covers wide range of pictorial representations.

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Course Overview Miguel Areias Computer Science Department Faculty of Sciences University of

Programming with SIMD Instructions Debrup Chakraborty Computer Science Department, Centro de

Math 4997-1 Lecture 6: Shared memory parallelism Patrick Diehl

Lecture 3 CSE 260 Parallel Computation (Fall 2015) Scott B. Baden Address space

Outline

Lightning Talks June 2, 2020 Session I 2:15 - 2:20 Caleb Springer, Penn State 2:20 - 2:25 Jacob

Assembly Language Programming Parallel architectures Zbigniew Jurkiewicz, Instytut Informatyki UW

Lattice QCD and Vittorio Lubicz flavour physics OUTLINE: OUTLINE: Workshop on The accuracy

GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

GRAPHICS PROCESSING UNIT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 will be available tonight (due on 04/18) This lecture

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Computer Graphics Overview CMSC 435/634 1 Graphics Areas Core graphics areas

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Graphics Computer Graphics vs. Graphic Design Computer Graphics is not using Photoshop-

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Images CS418 Computer Graphics John C. Hart Vector v. Raster Graphics Vector Graphics Raster

FreeBSD graphics Niclas Zeising zeising@FreeBSD.org agenda team the graphics stack

FreeBSD graphics Niclas Zeising zeising@FreeBSD.org agenda team the graphics stack

6. Graphics MULTIMEDIA &amp; GRAPHICS Graphics covers wide range of pictorial representations.

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Course Overview Miguel Areias Computer Science Department Faculty of Sciences University of

Programming with SIMD Instructions Debrup Chakraborty Computer Science Department, Centro de

Math 4997-1 Lecture 6: Shared memory parallelism Patrick Diehl

Lecture 3 CSE 260 Parallel Computation (Fall 2015) Scott B. Baden Address space

Outline

Lightning Talks June 2, 2020 Session I 2:15 - 2:20 Caleb Springer, Penn State 2:20 - 2:25 Jacob

Assembly Language Programming Parallel architectures Zbigniew Jurkiewicz, Instytut Informatyki UW

Lattice QCD and Vittorio Lubicz flavour physics OUTLINE: OUTLINE: Workshop on The accuracy

6. Graphics MULTIMEDIA & GRAPHICS Graphics covers wide range of pictorial representations.