the gpu sky morey chief architect deg degdigital com
play

the GPU Sky Morey Chief Architect @DEG degdigital.com Library: - PowerPoint PPT Presentation

SQLite Running entirely on the GPU Sky Morey Chief Architect @DEG degdigital.com Library: GpuEx How did we get here? The team was working on an n-body simulator with a requirement to dynamically add, remove, backup,


  1.  SQLite – Running entirely on the GPU  Sky Morey  Chief Architect @DEG  degdigital.com  Library: GpuEx

  2. How did we get here?  The team was working on an n-body simulator with a requirement to dynamically add, remove, backup, restore, and query the elements in the simulator. A SQL solution was the natural fit. DEG/15/5/ATS GTC :: April 2016

  3. Agenda  Part 1 – The Application – Strategies – Testing the Stack  Goals – Try the NuGet packages – Think about ways to use them in your own projects  Part 2 – Development Process – Runtime Layer – System Layer – Data Layer 2TLK/50M/Q GTC :: April 2016

  4. Part 1

  5. The Application see it in action

  6. Start with why  Working on an n-body  Good fit for SQL simulation  SQLite used as an  Had an engine running embedded database  SQLite has a small with some nodes  Next needed to code base  SQLite was under the – Insert, update and delete nodes MIT license – Backup and restore the nodes – Query the current state  SQLite was best match ISTD/C/C+/REF/STA GTC :: April 2016

  7. Block Diagram CPU Runtime Host System.Data System Runtime Pager Sentinel Win B-Tree Runtime-JimT cl Unix Runtime-TinyT cl VDBE DataEx SysEx CPU GPU Runtime Host System.Data Pager System Runtime System Sentinel B-Tree Map Win Runtime-JimT cl Gpu VDBE Runtime-TinyT cl Unix DataEx SysEx … dSql JimT cl TinyT cl GTC :: April 2016

  8. NuGet Simplicity  GpuEx-TinyTcl  GpuEx-Runtime  GpuEx-JimTcl  GpuEx-Runtime.TinyTcl  GpuEx-dSql  GpuEx-Runtime.JimTcl  GpuEx-System  GpuEx-System.Data Compute Microarch Library x86 x64 Exceptions None CPU {library}_cpu.lib Yes Yes 11,12,13 T esla {library}_11.lib Yes - Runtime only 20,21 Fermi {library}_20.lib Yes Yes 30,32 Kepler {library}_30.lib Yes Yes 35,37 {library}_35.lib Yes Yes 50 Maxwell {library}_50.lib Yes Yes 52, 53 {library}_52.lib Yes Yes 60 Pascal {library}_60.lib Yes Yes PKG/LIB35/ENV/PSAKE GTC :: April 2016

  9. Video :: NuGet Runtime GTC :: April 2016

  10. Video :: NuGet TinyTcl GTC :: April 2016

  11. Video :: NuGet DataEx GTC :: April 2016

  12. Video :: NuGet dSql GTC :: April 2016

  13. dSql Examples  Memory database  File database  x86 or x64  x64 only  DDL, DML  DML – Create table – Limit – Insert – Join – Select – Aggregate – Delete – Drop table GTC :: April 2016

  14. Video :: Memory Database GTC :: April 2016

  15. Video :: File Based Database GTC :: April 2016

  16. Strategies single threaded limitation and using with CUDA

  17. Strategies  Intermixing – single/multi kernel calls  Execution plan – plan single, exec multi  Ganging – warp execution as single thread GTC :: April 2016

  18. Strategy :: Intermixing  Intermix data probes Kernel<<<N, 32>>>() with application kernels Kernel<<<N, 32>>>() DataProbe<<<1, 1>>>() Kernel<<<N, 32>>>() Intermixing GPU threaded application kernel calls with single threaded SQLite kernel calls. GTC :: April 2016

  19. Strategy :: Execution Plan  Build plan single- plan = BuildPlan<<<1, 1>>>() threaded ExecutePlan<<<N, 32>>>(plan)  Execute plan muli- ExecutePlan<<<N, 32>>>(plan) threaded Having SQLite execution plan generation single threaded, while its execution is GPU threaded. GTC :: April 2016

  20. Strategy :: Ganging  Singular warp default T5 T6 T7 T1 T2 T3 T4  Primary warp method single instruction  Per warp pattern m return single malloc single instruction c c c c c c c single instruction Introduce “Ganging” for single thread kernel acceleration. Ganging executes in 32 thread form with a primary thread and 31 supporting threads for localized search or computation acceleration. GTC :: April 2016

  21. Testing the Stack how do we know it works?

  22. Testing Codebase  In-place and xUnit tests for unit testing  TCL scripts for unit and integration testing  Other patterns for automated testing GTC :: April 2016

  23. End of Part 1  Try the NuGet packages  Think about ways to use them in your own projects TRNS/Q GTC :: April 2016

  24. Part 2

  25. The Development Process how did this get built?

  26. Porting  SQLite is a single thread application, so it is implemented as such  Converted from C to C++ and CUDA, and segmented into three additive parts – Runtime – represents lower level operations with stdio/stdlib functions – System – OS layer abstraction and sentinel message bus – System.Data – SQLite core data engine GTC :: April 2016

  27. Build and Package  Packaging for multiple architectures and compilations issues – Project file changes, and build scripts for multiple build targets and NuGet packaging, with variations of release\debug, and win32\x64, and cpu\gpu20-35 – Multiple libraries, and hard-coded lib building for CUDA – Multiple cu files combined to hide context switching costs for faster build LIB GTC :: April 2016

  28. Video :: Packaging GTC :: April 2016

  29. Video :: Nvcc Context Switching Costs GTC :: April 2016

  30. Video :: Ellipse GTC :: April 2016

  31. File System  Native file system access with Sentinel – Sentinel is a host to device message bus that solves GPU access to host resources  Future: GPU only device file system – Host file system calls though an IPC to GPU files – dcat, dcmp, dcp, dgrep, dls, dmkdir, dmore, dmv, drm, drmdir – dchgrp, dchmod, dchown IPC/CTX/INPROC GTC :: April 2016

  32. Sentinel GPU CPU fopen fprintf fclose fopen fprintf fclose MEMORY fopen 0 2 0 fopen.txt, w 0x1234 0x1234, The quick brow… fprintf 2 0 0 43 fclose 0 2 0 0x1234 0 FILE *f = _fopen("fopen.txt", "w"); _fprintfR(f, "The quick brown fox jumps over the lazy dog"); _fcloseR(f); GTC :: April 2016

  33. NuGet Packaging GTC :: April 2016

  34. Tools ported  Lemon generator  TCL  Lemon generator for lexical processing, converted to .net tool with CUDA output  TCL ported to CUDA to run unit tests on hardware PKG/LIB35 GTC :: April 2016

  35. Layers The Runtime Layer

  36. Block Diagram CPU Runtime Host Runtime Sentinel CPU GPU Runtime Host Runtime Sentinel VCRT/HEA P/SENT GTC :: April 2016

  37. Runtime  Separate heap for _printf, _throw, _assert data shuffling to host  C runtime replacement in GPU, like stdio/stdlib – Memory management, alloc, realloc, free and debugging – atof, atoi64, atoi, itoa methods – toupper, isupper, isspace, isalnum, isalpha, isdigit, isxdigit, isidchar, tolower methods – strcpy, strncpy, strcat, strchr, strstr, strcmp, strncmp, memcpy, memstr, memchr, memcmp, memmove, strlen, hextobyte methods – snprintf methods  Methods prefixed with “_” to avoid naming collisions SQLITE/DIY GTC :: April 2016

  38. Layers The System Layer

  39. Block Diagram CPU Runtime Host System Runtime Sentinel Win Unix SysEx CPU GPU Runtime Host System Runtime System Sentinel Map Win Gpu Unix SysEx ISYS/GPU GTC :: April 2016

  40. System  Host only implementation of the Unix and Windows Systems  Device only version of a “map system” which sits in the GPU and shuffles messages back to the Host version using Sentinal ABSTRACT/SENTINEL GTC :: April 2016

  41. Layers The Data Layer

  42. Block Diagram CPU Runtime Host System.Data System Runtime Pager Sentinel Win B-Tree Unix VDBE DataEx SysEx CPU GPU Runtime Host System.Data Pager System Runtime System Sentinel B-Tree Map Win Gpu VDBE Unix DataEx SysEx GTC :: April 2016

  43. System.Data  B-Tree  Pager  VDBE  Stack had to be big, would crash if not > 5Meg CODEVOLUME/STACK GTC :: April 2016

  44. Conclusion :: NVidia Ask Thank you for your time Ask: Add a native SATA / NVMe interface to the GeForce line. GPUASSET/DB GTC :: April 2016

Recommend


More recommend