RTX-RSim Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging Peter Thoman, Markus Wippler, Robert Hranitzky, and Thomas Fahringer peter.thoman@uibk.ac.at IWOCL 2020
Background and Motivation IWOCL 2020 – RTX-RSim 2
The Basic Idea In room response simulation for time of flight imaging, we are interested in computing the propagation of light from a light source ( L ) through a room L (defined by some geometry and S surface properties G ) to a sensor array ( S ) G In the real world, L and S are part of a Time-of-flight (ToF) camera assembly. IWOCL 2020 – RTX-RSim 3
The Goal r Unlike in e.g. image rendering or lighting computations, the goal of the simulation is to compute a radiosity time series for each geometric primitive Based on this time series, which simulates the actual photons received by a ToF camera sensor, scene depth t can be reconstructed With RSim, since the exact depth is known, different scenes and reconstruction schemes can be easily evaluated Use during development of better ToF hardware implementations or software algorithms IWOCL 2020 – RTX-RSim 4
Algorithm Overview 1. Read input data, including geometric primitives ( 𝐻 ), their surface material information ( 𝜍 ), and initial impulse 2. Pre-computation of the per-triangle area ( 𝐵 𝑗 ) 𝐵 𝑗 𝑘 𝜐 𝑗𝑘 3. Mutual signal delay computation, storing the 𝑗 signal delay for each triangle pair ( 𝑗 , 𝑘 ) in 𝜐 𝑗𝑘 𝑘 4. Mutual visibility computation, evaluating the energy transfer between each triangle pair stochastically and storing in 𝐿 𝑗𝑘 𝑗 5. For each timestep 𝑢 ∈ [0, 𝑈 ): Propagate radiosity, computing 𝑠𝑏𝑒 𝑢,𝑗 for each triangle 𝑗 in all pairs ( 𝑗 , 𝑘 ) based on 𝐿 𝑗𝑘 and 𝑠𝑏𝑒 𝑢−1,𝑗 6. Compute the distance from the light/sensor position to each triangle 𝑗 , based on 𝑠𝑏𝑒 [0,𝑈),𝑗 IWOCL 2020 – RTX-RSim 5
Algorithm Performance and Data Requirement Analysis IWOCL 2020 – RTX-RSim 6
Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 4. Mutual visibility Analyse time complexity for each step of the comp. 𝐿 𝑗𝑘 algorithm. 5. Radiosity propagation 𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 7
Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 Steps 1 and 2 iterate over 𝑶 triangles, with simple 4. Mutual visibility I/O operations and area computation for each comp. 𝐿 𝑗𝑘 element. Readily identified as 𝑷 𝑶 complexity. 5. Radiosity propagation 𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 8
Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 Computing propagation delay for each pair of triangles 𝑷 𝑶 𝟑 4. Mutual visibility comp. 𝐿 𝑗𝑘 However, the fixed factor is low, and compared to the remaining phases, even 𝑶 𝟑 complexity is largely 5. Radiosity negligible. propagation 𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 9
Algorithm Steps 1. Input data prep. Stochastically evaluate the visibility between every 2. Pre-compute 𝐵 𝑗 pair of triangles – in naïve implementation requires a ray-triangle intersection check against all other 3. Pre-compute 𝜐 𝑗𝑘 triangles in the scene. With 𝑻 stochastic samples: 𝑃(𝑂 3 ∗ 𝑇) . 4. Mutual visibility comp. 𝐿 𝑗𝑘 In practice, use geometric acceleration structure. Current RSim on CPU uses octrees, resulting in a 5. Radiosity reduction of average-case query complexity from propagation 𝑃 𝑂 to 𝑃 log(𝑂) . 𝑠𝑏𝑒 [0,𝑈),𝑗 𝑷(𝑶 𝟑 ∗ 𝒎𝒑𝒉 𝑶 ∗ 𝑻) 6. Compute distance IWOCL 2020 – RTX-RSim 10
Algorithm Steps 1. Input data prep. Uses signal delay 𝜐 𝑗𝑘 and mutual visibility 2. Pre-compute 𝐵 𝑗 information 𝐿 𝑗𝑘 , as well as the previous radiosity up 3. Pre-compute 𝜐 𝑗𝑘 to the currently computed timestep 𝑠𝑏𝑒 [0,t),𝑗 . 4. Mutual visibility For each timestep 𝑢 and each pair ( 𝑗 , 𝑘 ): comp. 𝐿 𝑗𝑘 Propagate energy between triangles in the pair from time 𝑢 − 𝜐 𝑗,𝑘 according to mutual visibility as well as 5. Radiosity their surface properties. propagation 𝑠𝑏𝑒 [0,𝑈),𝑗 𝑷(𝑶 𝟑 ∗ 𝑼) 6. Compute distance IWOCL 2020 – RTX-RSim 11
Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 Distance computation usually based on cross- 3. Pre-compute 𝜐 𝑗𝑘 correlation of radiosity time series. 4. Mutual visibility 𝑷 𝑶 ∗ 𝑼 𝟑 comp. 𝐿 𝑗𝑘 T is usually much smaller than N, and fixed factor is 5. Radiosity very small as well. Usually negligible overall, similar propagation to step 3. 𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 12
Measured Performance 120 Mutual Visibility Scaling trend matches Relative Performance (Small = 1) 100 observations on Radiosity Simulation 80 algorithmic complexity Other Clearly mutual visibility 60 computation and radiosity simulation are 40 main priority 20 0 Small Medium Large IWOCL 2020 – RTX-RSim 13
Vulkan Raytracing and Compute for Room Response Simulation IWOCL 2020 – RTX-RSim 14
Data Management A Vulkan implementation needs to be massively data-parallel to be efficient And we are constrained in the amount of data we can store on a GPU Data-centric view of the algorithm IWOCL 2020 – RTX-RSim 15
Data Management Contents Format Size Triangles (G) Indexed vertex buffer 𝑂 3 * FP32 Material information ( ρ ) 𝑂 Raytracing Buffers Internal / opaque 𝑃(𝑂) 2 * FP32 𝑇 Sample Coordinates 𝑂 2 Mutual Visibility ( 𝐿 𝑗𝑘 ) FP16 4 * FP32 Radiosity ( 𝑠𝑏𝑒 ) 𝑂 ∗ 𝑈 Distance FP32 𝑂 Generally, 𝑇 ≪ 𝑈 ≪ 𝑂 , therefore 𝐿 𝑗𝑘 dominates. FP16 sufficient! Signal delay 𝜐 𝑗𝑘 recomputed instead of stored. IWOCL 2020 – RTX-RSim 16
Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no Schematic representation of HW raytracing process IWOCL 2020 – RTX-RSim 17
Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no Geometry is static we can optimize AS build for traversal speed rather than build/update performance IWOCL 2020 – RTX-RSim 18
Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no Descriptor Set: our RT shaders require read-only access to 𝐻 , 𝜍 , and the Sample Coordinates buffer, as well as write access to 𝐿 𝑗𝑘 Shaders: only require ray generation and a single hit and miss shader IWOCL 2020 – RTX-RSim 19
Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no Ray generation: generate 𝑇 rays for every pair of triangles (order independent, thus 𝑂²/2 − 𝑂 required size, 1D grid) Aggregate results and write to 𝐿 𝑗𝑘 IWOCL 2020 – RTX-RSim 20
Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no Miss shader: trivial, simply set visible=false for use in raygen shader Closest hit: check if expected triangle hit IWOCL 2020 – RTX-RSim 21
Recommend
More recommend