ence ce eren Confer A Fully A Fu lly GPU GPU-Based Based Ou Out-Of Of- ology Con Co Core e App pproac oach h to to Ha Handle ndle h, Germany many hnolog Oct 2018 2018 La Large e Volume olume Da Data ta echn Munich, Ger GPU Tec 11 Oct Munic GPU 09-11 Nicol colas s Courilleau rilleau 1,2 ,2 , , Jona nathan than Sarton ton 1 , , Flor orent ent Dugu guet 1,3 ,3 , ion 1 and Laurent Yann nnic ick k Remion ent Lucas as 1 09 1 – Univ 1 iversité ité de Reim ims Champa pagne gne-Ar Arde denn nne, Franc nce 2 – Neoxi 2 xia, Franc nce 3 – Altimesh 3 imesh, Franc nce
Background and motivation Previous works Outline Out-of-core model presentation Model in action: application to visualization Conclusion and outlook
Context Local 3D DATA Offshore x TB Teleworking • Targets HPC of 3DNeuroSecure • Interactive processing and visualization (virtual microscopy, DVR) of very large biomedical datasets • Accelerating drug discovery for Alzheimer disease N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Problematic • Designing out-of-core algorithms • Voxel representation → High volume of data >> CPU and GPU memory Domain/Application Data size Mesh 100 GB voxelization 4352 3 (RGBA – 32bits) ≈ 330 GB 100 GB Histology to Electron microscopy several TB Regular 3D grid And beyond N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Previous works
Previous works [Klaus Engel] [Fogal et al.] IEEE Symposium on Large Data IEEE Symposium on Large Data Analysis & Visualization Analysis & Visualization 2009 2011 2012 2013 ACM SIGGRAPH i3D IEEE Transaction on Visualisation & Computer Graphics [Crassin et al.] [Hadwiger et al.] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Previous works… at a glance • Address translation taxonomy [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Previous works… at a glance • Bricking: Page table look-up • Octree multi-resolution: tree traversal • Multi-resolution page table [Beyer et al. 2015] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
And Nvidia – Pascal / Volta unified memory • GPU memory oversubscription (unified memory) • Limited to host memory / OS specs limitation • Volume decomposition still needed • Volta using • Nvidia Tesla V100 • IBM Power 9 • NVLink 2 (+ OS ATS) • Unix « mmap » • Unix kernel 4.16 (at least) • Limitations • ATS over NVLink 2 = Power 9 • NVLink 2 = Tesla V100 • No page fault control • No texture memory Summit - DOE/SC/Oak Ridge National Laboratory [Everything you need to know about unified memory, Nikolay Sakharnykh, GTC 2018] N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Our contributions • GPU based out-of-core data management • Multiresolution multilevel page table hierarchy • Managed entirely on GPU • Any kind of applications (regular 3D grids of voxels) • Interactive visualization • On-demand data processing • Both at the same time • CPU – GPU communications reduced • Complete pipeline – From storage to GPU In addition, • Multi OS support, since Kepler architecture N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Out-of-core model presentation
Data representation and storage • (1) Multiresolution – Level of details Level 2 • (2) Bricking – Level subdivision • Allows the out-of-core approach Level 1 • (1) + (2) = Bricked multiresolution 3D pyramid Level 0 • Bonus: Data compression (LZ4 – Loss less and real-time decompression) N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Multiresolution multilevel page table hierarchy Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Multiresolution multilevel page table hierarchy Multiresolution page table Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Multiresolution multilevel page table hierarchy Multiresolution page directory Page table cache Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Multiresolution multilevel page table hierarchy • Entry = Multiresolution page directory • 3D coordinates of the block in the next cache • + Flag: Page table cache • Mapped • Unmapped • Empty Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Virtual addressing • Virtual volume navigation – address = [𝑚, 𝑞] • 𝑚 = Level of detail • 𝑞 = 3D normalized positon, 𝑦, 𝑧, 𝑨 ∈ [0, 1) 3 MRPD PT1 Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Cache miss • Virtual volume navigation – address = [𝑚, 𝑞] • 𝑚 = Level of detail • 𝑞 = 3D normalized positon, 𝑦, 𝑧, 𝑨 ∈ [0, 1) 3 MRPD PT1 Cache miss Brick cache N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 1 – Voxel cache request Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … Requested bricks Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2 – Hierarchy look-up Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2.1 – Request list creation Localhost Mass storage End-user Application L2 Bricks positions Cache Manager request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2.2 – Request list asynchronous handling Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2.3 – CPU cache look-up (simple cache) Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.3 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2.4 – If not in CPU cache = Loading from mass storage Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.3 2.4 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Pipeline 2.5 – Load bricks in a Cuda zero copy buffer Localhost Mass storage End-user Application L2 Bricks positions Cache Manager 2.2 request list Application Interface L1 … Request handler Requests 1 asynchronous thread handling ⚫ ⚫ ⚫ ⚫ … 2.1 Requested 2.5 2.3 2.4 bricks 2 Multi-level Cache manager L0 ⚫ ⚫ ⚫ ⚫ … LRUs update Multi-resolution CUDA zero copy Page Table communication Hierarchy RAM brick cache … … … + Hierarchy update Data Cache … GPU CPU N. Courilleau et al. , 2018-10-11 GTC 2018 Munich, E8246, Room 3
Recommend
More recommend