GPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchies Jesper Damkjær and Kenny Erleben { damkjaer,kenny } @diku.dk Department of Computer Science University of Copenhagen October 2009
Traditional BVH Traversal Two BVHs are traversed Using either a stack or a queue Using a descend rule descending either tree Descend both trees simultainiously For each descend, the BVs in the nodes are compared for overlap 2
Naive BVH on GPU One pair of BVHs per Thread Upper space bound for stack k ( c − 1) max ( height ( A ) , height ( B )) , max. cardinality, c , and size of two BV node references, k . Shared memory too small and global memory too slow 3
Use Blocks 1 Block ≡ Each node has 4 children If overlap ⇒ 16 new overlaps Less data to transfer and more work per thread 4
Use Double Buffered List Stack/Queue ⇒ Double buffered list Swap input/output paris for next pass 5
Memory Trick Needed 6
Need Imaginary Nodes Less than 4 children ⇒ fill with imaginary nodes Fills up space ⇒ part of calculation time ⇒ use sparesly 7
Blocks with Mixed Internal or Leaf Nodes Not allowed ⇒ Simpler code 8
Internal Block versus Leaf Block if collide ( a , k ) ⇒ push ( e , k ) if collide ( a , l ) collision ⇒ push ( e , k ) if collide ( a , m ) collision ⇒ push ( e , k ) if collide ( a , n ) collision ⇒ push ( e , k ) Redundant results ⇒ add extra check to code 9
The Test Setup Three different configuration types Structured stack Unstructured Pile Rock Slide 10
The Test Setup (Cont’d) For each configuration type Increasing number of triangles in objects Increasing number of objects Test against Rapid Rapid uses OBBs we use AABBs No optimization of imaginary nodes in BVHs (upto 33%) 11
Results Rapid on Intel Quad CPU using one core Stack: Rapid Pile: Rapid Rockslide: Rapid 5 0.3 3 Time in seconds Time in seconds 4 Time in seconds 0.2 3 2 2 0.1 1 1 0 0 0 1000 1000 2500 192 24000 24000 729 729 2000 48 6000 6000 512 512 1500 12 343 1500 343 1500 1000 216 216 500 Triangles per object Number of objects Triangles per object Number of objects Triangles per object Number of objects Cuda on ge9800 GX2 using one core Stack: Cuda only Pile: Cuda only Rockslide: Cuda only 5 0.3 3 Time in seconds Time in seconds Time in seconds 4 0.2 2 3 2 0.1 1 1 0 0 0 1000 1000 2500 192 24000 24000 729 729 2000 48 6000 6000 512 512 1500 12 343 1500 343 1500 1000 216 216 500 Triangles per object Triangles per object Triangles per object Number of objects Number of objects Number of objects Stack (5-8) Pile (3-7) Slide (2) 12
Thanks Questions? 13
Recommend
More recommend