S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND - PowerPoint PPT Presentation

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS Aurelien Chartier & Steve Ulrich, March 19 th , 2019

Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio Steve Ulrich CUDA Memcheck AGENDA Debug API Sanitizer API Aurelien Chartier 2

DEBUGGING TOOLS Steve Ulrich, March 19 th , 2019

WHO ARE YOU? Have you … • Ever written a CUDA application? Developed in CUDA for… • … over a year? • • … over five years? • Windows or Linux? • Desktop or Embedded? 4

WHAT ARE YOUR PREFERENCES? CLI vs. IDE? How many spaces in a tab? Big Endian vs. Little Endian? 5

DEBUGGING STYLES Before using a debugger: Reboot (“you may have a future in tech support ☺ ”) Google the error message Ask for help printf() assert() And eventually… Break out the debugger 6

NVIDIA DEBUGGER PORTFOLIO IDE Tools CLI Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio CUDA MEMCHECK Development Libraries CUDA Debug API 7

NSIGHT ECLIPSE Eclipse IDE for CUDA Install Nsight Eclipse plugins in your own Eclipse environment • • Supported in Eclipse version 4.7/4.8/4.9 for CUDA 10.1 (4.10 is in validation) Full featured IDE to edit, build and debug CUDA applications • NVCC build integration to cross compile for various target • platforms(x86/L4T/Drive Linux/Drive QNX). Debugger - Seamless and simultaneous debugging of both CPU • and GPU code using CUDA GDB • New with 10.1 - Docker Support 8

NSIGHT ECLIPSE EDITION Docker Support Use the Drive OS Docker images from NVIDIA GPU Cloud to build the CUDA • projects using Nsight EE. Nsight EE plugins supports mounting the project source files from the host to • docker container and build the projects using CUDA toolkit/toolchains available inside docker container. No CUDA toolkit installation is required on the host. You can choose the docker image to use when creating the project or in • preference page. • Remote debug applications from Nsight EE using cuda-gdb inside docker container.. 9

CUDA-GDB Command line source and assembly (SASS) level debugger Simultaneous CPU and GPU debugging Inspect and modify memory, register, variable state Control program execution Support for multiple GPUs, multiple contexts, multiple kernels 10

CUDA-GDB 11

EVER HAD A CRASH AND WISHED YOU’D BEEN DEBUGGING ? $ CUDA_DEVICE_WAITS_ON_EXCEPTION=1 12

EVER HAD A CRASH AND WISHED YOU’D BEEN DEBUGGING? $ CUDA_DEVICE_WAITS_ON_EXCEPTION=1 13

WHAT ABOUT CORE DUMPS? $ CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1 14

WHAT ABOUT CORE DUMPS? $ CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1 15

NSIGHT VISUAL STUDIO Visual Studio IDE for CUDA Native CUDA C/C++ GPU Debugging Source-correlated assembly debugging (SASS / PTX / SASS+PTX) Data breakpoints for CUDA C/C++ code Expressions in Locals, Watch and Conditionals 16

NSIGHT VISUAL STUDIO 17

CUDA-MEMCHECK Four tools in One: Memory Checker Race Condition Checker Initialization Checker Synchronization Checker 18

CUDA DEBUG API Enable 3 rd -party Debuggers ABI Support Exception Reporting Attach and Detach Runtime control State Inspection 19

WHERE TO GET THE TOOLS? https://developer.nvidia.com/tools-overview 20

CODE ANALYSIS – SANITIZER API Aurelien Chartier, March 19 th , 2019

EVER… Used cuda-memcheck but needed more control? • • Needed a tool that let you analyze memory access patterns in your code? Wished for a PIN-like binary instrumentation tool for CUDA? • 22

INTRODUCING THE SANITIZER API Released in CUDA 10.1 Callback host function Sanitizer callback API Memory Kernel launch Stream sync Host allocations Target Device code Sanitizer patching API Callback device function 23

SANITIZER API USAGE Callback API 1. Write a callback function using Sanitizer_CallbackFunc function signature static void ApiTrackerCallback( void* userdata, Sanitizer_CallbackDomain domain, Sanitizer_CallbackId cbid, const void* cbdata) { auto* pCallbackData = (Sanitizer_CallbackData*)cbdata; auto returnValue = *(cudaError_t*)pCallbackData->functionReturnValue; std::cout << "API call to " << pCallbackData->functionName <<" (return code " << returnValue << ")" << std::endl; } 24

SANITIZER API USAGE Callback API 2. Use sanitizerSubscribe to create a sanitizer subscriber sanitizerSubscribe(&handle, ApiTrackerCallback, userdata); 3. Use one of the sanitizerEnableCallback functions to enable/disable callbacks sanitizerEnableDomain(1, handle, SANITIZER_CB_DOMAIN_RUNTIME_API); As in the CUPTI API, only one subscriber can be active at any point. 25

API TRACKER ApiTrackerCallback Memory Kernel launch Stream sync allocations Target Device code 26

API TRACKER 27

SANITIZER API USAGE Patching API: device side 1. Write patches as __device__ functions using the SanitizerCallback function signature extern "C" __device__ __noinline__ SanitizerPatchResult MemoryAccessCallback( void* userdata, uint64_t pc, void* ptr, uint32_t accessSize, uint32_t flags) { auto* pTracker = (MemoryAccessTracker*)userdata; uint32_t old = atomicAdd(&(pTracker->currentEntry), 1); MemoryAccess& access = pTracker->accesses[old]; access.address = (uint64_t)(uintptr_t)ptr; access.accessSize = accessSize; access.flags = flags; return SANITIZER_PATCH_SUCCESS; } 28

SANITIZER API USAGE Patching API: device side 2. Compile patches in a cubin or fatbin using new PTXAS option – -compile-as-tools- patch $ nvcc --cubin --keep-device-functions – Xptxas – -compile-as-tools-patch MemoryTrackerPatches.cu – o MemoryTrackerPatches.cubin 29

SANITIZER API USAGE Patching API: host side 1. Load patches in a CUDA context using sanitizerAddPatches sanitizerAddPatchesFromFile("MemoryTrackerPatches.cubin", ctx); 2. Mark which instructions should be patched using sanitizerPatchInstructions sanitizerPatchInstructions(SANITIZER_INSTRUCTION_MEMORY_ACCESS, module, "MemoryAccessCallback"); 3. Once all instructions have been marked, use sanitizerPatchModule to patch a given module sanitizerPatchModule(pModuleData->module); 4. Use sanitizerSetCallbackData to specify user data passed to callbacks sanitizerSetCallbackData(stream, userdata); 30

MEMORY TRACKER 1. Load patches 2. Mark memory accesses to be patched Retrieve userdata 4. Set userdata 3. Patch module Module load Kernel launch Stream sync Host Target Device code MemoryAccessCallback 31

MEMORY TRACKER 32

WHERE TO GET THE TOOL Released in CUDA 10.1 under extras/Sanitizer folder • • Support for Windows, Linux (x64 only), OSX Support for Maxwell and newer GPUs • • Missing features: CDP (partial support) • CUDA graphs • • Atomics on short types • Get help with either: devtools-support@nvidia.com • • https://devtalk.nvidia.com/default/board/373/compute-sanitizer-api/ 33

SANITIZER SAMPLES Find the full samples code for the ApiTracker and MemoryTracker at: https://github.com/NVIDIA/compute-sanitizer-samples 34

DEVELOPER TOOLS AT GTC19 Talks: S9751: Accelerate Your CUDA Development with Latest Debugging and Code Analysis Developer Tools, Tue @9am S9866 - Optimizing Facebook AI Workloads for NVIDIA GPUs, Tue @9am S9345: CUDA Kernel Profiling using NVIDIA Nsight Compute, Tue @1pm S9661: Nsight Graphics - DXR/Vulkan Profiling/Vulkan Raytracing, Wed @10am S9503: Using Nsight Tools to Optimize the NAMD Molecular Dynamics Simulation Program, Wed @1pm Hands-on labs: L9102: Jetson Developer Tools Training Lab, Mon @9am, 11:30am L9124: Debugging and optimizing CUDA applications with Nsight products on Linux training lab, Tue @8am, 10am Connect with the Experts (where DevTools will be available): CE9123: CUDA & Graphics Developer Tools, Tue @2pm, Wed @3pm CE9137: Jetson Embedded Platform, Tue @12pm, 5pm, Wed @1pm, 4pm, Thu @12pm Podium: Demos of DevTools products on Linux, DRIVE AGX & Jetson AGX at the showfloor Tue @12pm – 7pm Wed @12pm – 7pm 35 Thu @11am – 2pm

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND - PowerPoint PPT Presentation

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS Aurelien Chartier & Steve Ulrich, March 19 th , 2019 Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio Steve Ulrich CUDA

Software Debugging: Past, Present, and Future Alessandro (Alex) Orso School of Computer

GDB CIM overview CIMug Ljubljana 2018 June 6 th 2018 Matija Gruden About GDB Leading

Programming assignment 1 walkthrough Tao PA 1 walkthrough: Part I VirtualBox Install

Who Am I An Economist and MBA. Computer enthusiast for the past 30 years. Someone

Next-Generation Debuggers For Reverse Engineering For Reverse Engineering The ERESI team

radare2 22 y/o french expat @ Luxembourg Food, Travel and Languages <3 I hate

Dynamic Data Excavation or: Gimme back my symbol table! Asia Slowinska Traian Stancescu

In Ken enya Presented by: Le Leonar ard d Mabel ele th June 26 26 th e 2020 TV White

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

GFAL 2.0 Devresse Adrien CERN lcgutil team lcgutil-support@cern.ch What is GFAL 2.0 ? One

Performance Evaluation in Theia Compass Herv KABAMBA Michel Dagenais December 9, 2019

X(cross) Development System make AGL application development easier July 2017 Sbastien

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas The Distributed

Overview of FY20 Certified Budget for the Commonwealth of Puerto Rico July 1, 2019 1. Executive

Distributed, virtual and real debugging of a MIPS SoC Martin Strubel section5.ch 02/2013

The Hello World GCC Front End Rafael Gustavo Sverzut Barbieri Avila de Esp ndola GPSL -

Container-based virtualization Process-level Extensively used in Lightweight virtualization

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

Outline Background 1 Xeon Phi Architecture 2 Programming Xeon Phi TM 3 Native Mode Offload

Investor Presentation Second Quarter 2016 Forward Looking Statements This presentation

RISC-V: Emulation and Rich, Non-Intrusive Analytics Address Verification Complexity Embedded

Trace Debugging in lowRISC lowRISC release v0.3 with Open SoC Debug Wei Song 1 , Stefan

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer,

PeeringDB Arnold Nipper arnold@peeringdb.com 2017-03-23 CEE Peering Days 2017, Ljubljana,

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND - PowerPoint PPT Presentation

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS Aurelien Chartier & Steve Ulrich, March 19 th , 2019 Debugging Tools Nsight Eclipse Edition CUDA GDB Nsight Visual Studio Steve Ulrich CUDA

Software Debugging: Past, Present, and Future Alessandro (Alex) Orso School of Computer

GDB CIM overview CIMug Ljubljana 2018 June 6 th 2018 Matija Gruden About GDB Leading

Programming assignment 1 walkthrough Tao PA 1 walkthrough: Part I VirtualBox Install

Who Am I An Economist and MBA. Computer enthusiast for the past 30 years. Someone

Next-Generation Debuggers For Reverse Engineering For Reverse Engineering The ERESI team

radare2 22 y/o french expat @ Luxembourg Food, Travel and Languages &lt;3 I hate

Dynamic Data Excavation or: Gimme back my symbol table! Asia Slowinska Traian Stancescu

In Ken enya Presented by: Le Leonar ard d Mabel ele th June 26 26 th e 2020 TV White

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

GFAL 2.0 Devresse Adrien CERN lcgutil team lcgutil-support@cern.ch What is GFAL 2.0 ? One

Performance Evaluation in Theia Compass Herv KABAMBA Michel Dagenais December 9, 2019

X(cross) Development System make AGL application development easier July 2017 Sbastien

Michael Q. Jones &amp; Matt B. Pedersen University of Nevada Las Vegas The Distributed

Overview of FY20 Certified Budget for the Commonwealth of Puerto Rico July 1, 2019 1. Executive

Distributed, virtual and real debugging of a MIPS SoC Martin Strubel section5.ch 02/2013

The Hello World GCC Front End Rafael Gustavo Sverzut Barbieri Avila de Esp ndola GPSL -

Container-based virtualization Process-level Extensively used in Lightweight virtualization

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

Outline Background 1 Xeon Phi Architecture 2 Programming Xeon Phi TM 3 Native Mode Offload

Investor Presentation Second Quarter 2016 Forward Looking Statements This presentation

RISC-V: Emulation and Rich, Non-Intrusive Analytics Address Verification Complexity Embedded

Trace Debugging in lowRISC lowRISC release v0.3 with Open SoC Debug Wei Song 1 , Stefan

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer,

PeeringDB Arnold Nipper arnold@peeringdb.com 2017-03-23 CEE Peering Days 2017, Ljubljana,

radare2 22 y/o french expat @ Luxembourg Food, Travel and Languages <3 I hate

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas The Distributed