SQLite – Running entirely on the GPU Sky Morey Chief Architect @DEG degdigital.com Library: GpuEx
How did we get here? The team was working on an n-body simulator with a requirement to dynamically add, remove, backup, restore, and query the elements in the simulator. A SQL solution was the natural fit. DEG/15/5/ATS GTC :: April 2016
Agenda Part 1 – The Application – Strategies – Testing the Stack Goals – Try the NuGet packages – Think about ways to use them in your own projects Part 2 – Development Process – Runtime Layer – System Layer – Data Layer 2TLK/50M/Q GTC :: April 2016
Part 1
The Application see it in action
Start with why Working on an n-body Good fit for SQL simulation SQLite used as an Had an engine running embedded database SQLite has a small with some nodes Next needed to code base SQLite was under the – Insert, update and delete nodes MIT license – Backup and restore the nodes – Query the current state SQLite was best match ISTD/C/C+/REF/STA GTC :: April 2016
Block Diagram CPU Runtime Host System.Data System Runtime Pager Sentinel Win B-Tree Runtime-JimT cl Unix Runtime-TinyT cl VDBE DataEx SysEx CPU GPU Runtime Host System.Data Pager System Runtime System Sentinel B-Tree Map Win Runtime-JimT cl Gpu VDBE Runtime-TinyT cl Unix DataEx SysEx … dSql JimT cl TinyT cl GTC :: April 2016
NuGet Simplicity GpuEx-TinyTcl GpuEx-Runtime GpuEx-JimTcl GpuEx-Runtime.TinyTcl GpuEx-dSql GpuEx-Runtime.JimTcl GpuEx-System GpuEx-System.Data Compute Microarch Library x86 x64 Exceptions None CPU {library}_cpu.lib Yes Yes 11,12,13 T esla {library}_11.lib Yes - Runtime only 20,21 Fermi {library}_20.lib Yes Yes 30,32 Kepler {library}_30.lib Yes Yes 35,37 {library}_35.lib Yes Yes 50 Maxwell {library}_50.lib Yes Yes 52, 53 {library}_52.lib Yes Yes 60 Pascal {library}_60.lib Yes Yes PKG/LIB35/ENV/PSAKE GTC :: April 2016
Video :: NuGet Runtime GTC :: April 2016
Video :: NuGet TinyTcl GTC :: April 2016
Video :: NuGet DataEx GTC :: April 2016
Video :: NuGet dSql GTC :: April 2016
dSql Examples Memory database File database x86 or x64 x64 only DDL, DML DML – Create table – Limit – Insert – Join – Select – Aggregate – Delete – Drop table GTC :: April 2016
Video :: Memory Database GTC :: April 2016
Video :: File Based Database GTC :: April 2016
Strategies single threaded limitation and using with CUDA
Strategies Intermixing – single/multi kernel calls Execution plan – plan single, exec multi Ganging – warp execution as single thread GTC :: April 2016
Strategy :: Intermixing Intermix data probes Kernel<<<N, 32>>>() with application kernels Kernel<<<N, 32>>>() DataProbe<<<1, 1>>>() Kernel<<<N, 32>>>() Intermixing GPU threaded application kernel calls with single threaded SQLite kernel calls. GTC :: April 2016
Strategy :: Execution Plan Build plan single- plan = BuildPlan<<<1, 1>>>() threaded ExecutePlan<<<N, 32>>>(plan) Execute plan muli- ExecutePlan<<<N, 32>>>(plan) threaded Having SQLite execution plan generation single threaded, while its execution is GPU threaded. GTC :: April 2016
Strategy :: Ganging Singular warp default T5 T6 T7 T1 T2 T3 T4 Primary warp method single instruction Per warp pattern m return single malloc single instruction c c c c c c c single instruction Introduce “Ganging” for single thread kernel acceleration. Ganging executes in 32 thread form with a primary thread and 31 supporting threads for localized search or computation acceleration. GTC :: April 2016
Testing the Stack how do we know it works?
Testing Codebase In-place and xUnit tests for unit testing TCL scripts for unit and integration testing Other patterns for automated testing GTC :: April 2016
End of Part 1 Try the NuGet packages Think about ways to use them in your own projects TRNS/Q GTC :: April 2016
Part 2
The Development Process how did this get built?
Porting SQLite is a single thread application, so it is implemented as such Converted from C to C++ and CUDA, and segmented into three additive parts – Runtime – represents lower level operations with stdio/stdlib functions – System – OS layer abstraction and sentinel message bus – System.Data – SQLite core data engine GTC :: April 2016
Build and Package Packaging for multiple architectures and compilations issues – Project file changes, and build scripts for multiple build targets and NuGet packaging, with variations of release\debug, and win32\x64, and cpu\gpu20-35 – Multiple libraries, and hard-coded lib building for CUDA – Multiple cu files combined to hide context switching costs for faster build LIB GTC :: April 2016
Video :: Packaging GTC :: April 2016
Video :: Nvcc Context Switching Costs GTC :: April 2016
Video :: Ellipse GTC :: April 2016
File System Native file system access with Sentinel – Sentinel is a host to device message bus that solves GPU access to host resources Future: GPU only device file system – Host file system calls though an IPC to GPU files – dcat, dcmp, dcp, dgrep, dls, dmkdir, dmore, dmv, drm, drmdir – dchgrp, dchmod, dchown IPC/CTX/INPROC GTC :: April 2016
Sentinel GPU CPU fopen fprintf fclose fopen fprintf fclose MEMORY fopen 0 2 0 fopen.txt, w 0x1234 0x1234, The quick brow… fprintf 2 0 0 43 fclose 0 2 0 0x1234 0 FILE *f = _fopen("fopen.txt", "w"); _fprintfR(f, "The quick brown fox jumps over the lazy dog"); _fcloseR(f); GTC :: April 2016
NuGet Packaging GTC :: April 2016
Tools ported Lemon generator TCL Lemon generator for lexical processing, converted to .net tool with CUDA output TCL ported to CUDA to run unit tests on hardware PKG/LIB35 GTC :: April 2016
Layers The Runtime Layer
Block Diagram CPU Runtime Host Runtime Sentinel CPU GPU Runtime Host Runtime Sentinel VCRT/HEA P/SENT GTC :: April 2016
Runtime Separate heap for _printf, _throw, _assert data shuffling to host C runtime replacement in GPU, like stdio/stdlib – Memory management, alloc, realloc, free and debugging – atof, atoi64, atoi, itoa methods – toupper, isupper, isspace, isalnum, isalpha, isdigit, isxdigit, isidchar, tolower methods – strcpy, strncpy, strcat, strchr, strstr, strcmp, strncmp, memcpy, memstr, memchr, memcmp, memmove, strlen, hextobyte methods – snprintf methods Methods prefixed with “_” to avoid naming collisions SQLITE/DIY GTC :: April 2016
Layers The System Layer
Block Diagram CPU Runtime Host System Runtime Sentinel Win Unix SysEx CPU GPU Runtime Host System Runtime System Sentinel Map Win Gpu Unix SysEx ISYS/GPU GTC :: April 2016
System Host only implementation of the Unix and Windows Systems Device only version of a “map system” which sits in the GPU and shuffles messages back to the Host version using Sentinal ABSTRACT/SENTINEL GTC :: April 2016
Layers The Data Layer
Block Diagram CPU Runtime Host System.Data System Runtime Pager Sentinel Win B-Tree Unix VDBE DataEx SysEx CPU GPU Runtime Host System.Data Pager System Runtime System Sentinel B-Tree Map Win Gpu VDBE Unix DataEx SysEx GTC :: April 2016
System.Data B-Tree Pager VDBE Stack had to be big, would crash if not > 5Meg CODEVOLUME/STACK GTC :: April 2016
Conclusion :: NVidia Ask Thank you for your time Ask: Add a native SATA / NVMe interface to the GeForce line. GPUASSET/DB GTC :: April 2016
Recommend
More recommend