MIXED REALITY FUSION Sven Middelberg, Developer Technology Engineer smiddelberg@nvidia.com
VIRTUAL REALITY DEPTH FUSION 2
THE SETUP Intel Realsense D435 + Vive Tracker NVIDIA GP100 Vive 90 FPS 848x480 depth stream 3584 CUDA Cores 90 Htz Update Rate 16GB HBM2 Memory 3
MIXED REALITY FUSION 4
MIXED REALITY FUSION 5
MIXED REALITY FUSION Symbiosis of VR and depth fusion TAKEAWAYS How can we take advantage of the VR system to make depth fusion more robust? Which optimizations are necessary to simultaneously reconstruct a 90 fps depth stream and visualize it in stereo VR? 6
DEPTH FUSION IN A NUTSHELL AGENDA ROBUST MIXED REALITY FUSION CUDA IMPLEMENTATION & OPTIMIZATIONS 7
DEPTH FUSION IN A NUTSHELL 8
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting VERTEX & NORMAL MAP VOLUMETRIC RECONSTRUCTION 12
RECONSTRUCTION DATA STRUCTURE Voxel Grid V i -3.2 -1.9 -1.2 -0.6 -0.4 -0.2 Truncated Signed Distance Field D i -2.7 -1.8 -0.9 0.0 0.5 0.8 Truncation Size μ -1.9 -1.6 -1.0 0.0 0.9 1.7 -1.0 -0.6 -0.5 -0.1 0.8 1.7 -0.3 0.4 0.5 0.8 1.2 2.0 -0.8 -0.4 0.1 0.5 1.1 1.9 13
RECONSTRUCTION DATA STRUCTURE Voxel Grid V i -3.2 -1.9 -1.2 -0.6 -0.4 -0.2 Truncated Signed Distance Field D i μ = 1.0 -2.7 -1.8 -0.9 0.0 0.5 0.8 Truncation Size μ -1.9 -1.6 -1.0 0.0 0.9 1.7 -1.0 -0.6 -0.5 -0.1 0.8 1.7 -0.3 0.4 0.5 0.8 1.2 2.0 -0.8 -0.4 0.1 0.5 1.1 1.9 14
RECONSTRUCTION DATA STRUCTURE Voxel Grid V i 1.0 1.0 1.0 -0.6 -0.4 -0.2 Truncated Signed Distance Field D i μ = 1.0 1.0 1.0 -0.9 0.0 0.5 0.8 Truncation Size μ 1.0 1.0 1.0 0.0 0.9 1.0 Number of samples C i 1.0 -0.6 -0.5 -0.1 0.8 1.0 6 x 6 x 4 m³, 4mm voxel size -0.3 0.4 0.5 0.8 1.0 1.0 1500 x 1500 x 1000 voxel grid -0.8 -0.4 0.1 0.5 1.0 1.0 2.25 billion voxels ⇒ 16.76 Gbyte 15
SPARSE VOXEL HASHING Nießner et al., 2013 i-5 i-4 i-3 i-2 i-1 i i+1 i+2 i+3 i+4 Voxel brick: N 3 voxel cube Hash function h ( b x , b y , b z ) that maps from brick Hash Entry space to hash bucket 3D Brick Position Overflow List Offset Brick Atlas Pointer Hash entry references actual brick memory within preallocated brick atlas Brick Atlas 16
VOLUMETRIC FUSION 17
VOLUMETRIC FUSION D i , C i 18
VOLUMETRIC FUSION Project voxel onto image plane D i , C i 19
VOLUMETRIC FUSION Project voxel onto image plane D i , C i Find nearest depth 20
VOLUMETRIC FUSION Project voxel onto image plane D i , C i Find nearest depth Compute TSDF sample d 21
VOLUMETRIC FUSION Project voxel onto image plane D i , C i Find nearest depth Compute TSDF sample d Update D i , C i : 𝐸 𝑗 ← 𝐷 𝑗 ∗ 𝐸 𝑗 + 𝑒 𝐷 𝑗 + 1 𝐷 𝑗 ← min(𝐷 𝑗 + 1, 𝐷 𝑛𝑏𝑦 ) 22
VERTEX & NORMAL MAP CONSTRUCTION Raycasting μ Two-stage raycasting 1st stage: March ray in steps of - μ truncation region size μ μ 2nd stage: March voxel by voxel 33
VERTEX & NORMAL MAP CONSTRUCTION Raycasting μ Two-stage raycasting 1st stage: March ray in steps of - μ truncation region size μ μ 2nd stage: March voxel by voxel V* N* V* : Ray position at zero-crossing N* : Gradient of TSDF at V* 34
POSE ESTIMATION Given: Depth Image I, Raycast pose P* , V* , N* V* , N* Find pose P = ( R|t ) of I Construct depth pyramid I j , 0 ≤ j < L Extract camera space vertices & normals V j , N j Iterative coarse-to-fine minimization of distance between V j , N j and V* , N* Initialize P with P* V 0 , N 0 V 1 , N 1 V 2 , N 2 39
POSE ESTIMATION Point-Plane ICP 𝑘 , 𝑂 𝑙 𝑘 , 𝑊 Find correspondences ( 𝑊 ∗ , 𝑂 𝑙 ∗ ) 𝑙 𝑙 𝑘 𝑄 ∆ 𝑄𝑊 𝑙 Minimize sum of squared point plane 𝑘 , 𝑊 ∗ , 𝑂 𝑙 ∗ ) 𝑒𝑗𝑡𝑢(𝑄 ∆ 𝑄𝑊 𝑙 𝑙 distances for 𝑄 ∆ : 𝑘 , 𝑊 ∗ , 𝑂 𝑙 ∗ ) 2 𝐹 𝑄 ∆ = 𝑒𝑗𝑡𝑢(𝑄 ∆ 𝑄𝑊 𝑙 𝑙 𝑙 ∗ 𝑂 𝑙 Update 𝑄 ← 𝑄 ∆ 𝑄 ∗ 𝑊 𝑙 44
POSE ESTIMATION Lie-Algebraic Parameterization 𝑇𝐹(3) : Lie-group of poses Substitute 𝑄 ∆ = 𝑓𝑦𝑞(𝜀) : (12 parameters) 𝑘 , 𝑊 ∗ , 𝑂 𝑙 ∗ ) 2 𝐹 𝜀 = 𝑒𝑗𝑡𝑢(𝑓𝑦𝑞(𝜀) 𝑄𝑊 𝑙 𝑙 𝑡𝑓(3) : Lie-algebra (6 parameters) 𝑙 Minimal parameterization! Mapping between SE(3) and se(3): 𝑓𝑦𝑞 ∶ 𝑡𝑓 3 → 𝑇𝐹 3 𝑚𝑝 ∶ 𝑇𝐹 3 → 𝑡𝑓(3) 48
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting VERTEX & NORMAL MAP VOLUMETRIC RECONSTRUCTION 49
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting VERTEX & NORMAL MAP VOLUMETRIC RECONSTRUCTION 50
ROBUST MIXED REALITY FUSION 51
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting VERTEX & NORMAL MAP TRACKED POSE VOLUMETRIC DISPLAY RECONSTRUCTION 52
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting VERTEX & NORMAL MAP TRACKED POSE VOLUMETRIC DISPLAY RECONSTRUCTION 53
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting TRACKED POSE VOLUMETRIC DISPLAY RECONSTRUCTION 54
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting TRACKED POSE Stereo Raycast VOLUMETRIC DISPLAY RECONSTRUCTION 55
ROBUST MIXED REALITY FUSION Cheap relocalization using VR-tracked pose 𝑄 𝑊𝑆 Requires registration of reconstruction and VR coordinate systems Why not use 𝑄 𝑊𝑆 directly? 56
ROBUST MIXED REALITY FUSION Using 𝑄 𝑊𝑆 directly Regularized optimization of 𝑄 𝑊𝑆 57
REGISTRATION 𝑄 0 𝑊𝑆 𝑄 0 Find transformation T , such that 𝑊𝑆 𝑄 𝑗 = 𝑈 ∗ 𝑄 𝑗 𝑄 1 𝑊𝑆 𝑄 T 1 Lie-algebraic approach 𝑊𝑆 𝑄 2 Lie-Algebraic Averaging for Globally 𝑄 2 Consistent Motion Estimation Govindu, CVPR‘2004 𝑊𝑆 𝑄 3 𝑄 3 59
RELOCALIZATION & REGULARIZATION Initialization of pose estimation with 𝑄 𝑊𝑆 𝑦, 𝑧 ≈ 0 : 𝑚𝑝 𝑓𝑦𝑞 𝑦 𝑓𝑦𝑞 𝑧 ≈ 𝑦 + 𝑧 Penalize distance of 𝑄 𝑊𝑆 and exp 𝜀 𝑄 : 𝑒𝑗𝑡𝑢 𝜀 = 𝑚𝑝 𝑓𝑦𝑞 𝜀 𝑄𝑄 𝑊𝑆−1 ≈ 𝜀 + 𝑚𝑝(𝑄𝑄 𝑊𝑆−1 ) 𝐹 ′ 𝜀 = 𝐹 𝜀 + 𝑒𝑗𝑡𝑢 𝜀 𝑈 𝑇 −1 𝑒𝑗𝑡𝑢 𝜀 Low computational overhead 61
CUDA IMPLEMENTATION & OPTIMIZATIONS 62
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting TRACKED POSE Stereo Raycast VOLUMETRIC DISPLAY RECONSTRUCTION 63
POSE ESTIMATION 𝑄 V j N j Regularization & System Setup Solving 𝑄 V* N* 𝑄 𝑊𝑆 System Setup - GPU Regularization & Solving - CPU Find correspondences Add regularization term to system Solve for 𝜀 Setup linear system σ 𝐾 𝑙 𝑈 𝐾 𝑙 𝜀 = − σ 𝐾 𝑙 𝑠 𝑙 Update 𝑄 ← 𝑓𝑦𝑞 𝜀 𝑄 64
POSE ESTIMATION Baseline __global__ void setupSystem(float* gSys, ...) void solve(float* gSys, ...) { { int x = getPixX(); float hostSys[27]; int y = getPixY(); cudaMemcpyAsync(hostSys, gSys, ...); if (findCorrespondence(x, y, ...)) cudaStreamSynchronize(stream); { cudaMemsetAsync(gSys, 0, ...); float lSys[27]; addRegularization(hostSys, ...); computeLocalSystem(lSys, ...); float delta[6]; #pragma unroll solve(hostSys, delta); for (int i=0; i<27; ++i) pose = exp(delta)*pose; { } atomicAdd(gSys+i, lSys[i]); } } } 73
POSE ESTIMATION Baseline 36.4 26.5 19.5 BASELINE 74
POSE ESTIMATION Warp-Aggregated Atomics __global__ void setupSystem(float* gSys, ...) __device__ __forceinline__ { void warpReduceSystem(float* lSys, int lane) { int x = getPixX(); int y = getPixY(); #pragma unroll float lSys[27]; initZero(lSys); for (int i=0; i<27; ++i) if (findCorrespondence(x, y, ...)) { { computeLocalSystem(lSys, ...); warpReduce(lSys[i], lane); } if (lane == i) int lane = getLane(); lSys[0] = lSys[i]; warpReduceSystem(lSys, lane); } } if (lane < 27) { atomicAdd(gSys+lane, lSys[0]); } } 81
POSE ESTIMATION Warp-Aggregated Atomics 36.4 26.5 19.5 3.7 3.6 3 BASELINE WARP-AGGREGATED ATOMICS SPEEDUP 8.0 82
POSE ESTIMATION Minimizing CPU Overhead 126.5 μ s 83
POSE ESTIMATION Minimizing CPU Overhead Launch of next system setup kernel 79.5 μ s 84
POSE ESTIMATION Minimizing CPU Overhead Regularize & solve on GPU __constant__ SE3 cPose; void trackingStep(float* gSys, SE3* gPose, ...) Removes CPU ↔ GPU ✓ { synchronization cudaMemcpyToSymbolAsync(cPose, gPose, ...); setupSystem<<<...>>>(gSys, ...); Removes CPU ↔ GPU copies ✓ solveAndUpdate<<<...>>>(gSys, gPose); } Keeps GPU busy ✓ 85
POSE ESTIMATION Minimizing CPU Overhead 43.3 μ s 86
POSE ESTIMATION Minimizing CPU Overhead 36.4 26.5 19.5 3.7 3.6 2.4 3 3 3 BASELINE WARP-AGGREGATED GPU SOLVER ATOMICS SPEEDUP SPEEDUP 8.0 9.7 87
NEW FRAME POSE Pose Volumetric Estimation Fusion Raycasting TRACKED POSE Stereo Raycast VOLUMETRIC DISPLAY RECONSTRUCTION 88
Recommend
More recommend