S7698: CanvoX: High-Resolution VR Painting for Large Volumetric Canvas Yeojin Kim 1 , Byungmoon Kim 2 , Jiyang Kim 1 and Young J. Kim 1 1 Ewha Womans University , 2 Adobe Research http://graphics.ewha.ac.kr/canvox/
Vector or Pixel? Tilt Brush, Quill, … Our System
Voxels? • Voxel = Volume + Pixel • Easy to manipulate and traverse • Allow to recolor, erase strokes and mix colors • Allow to express semi-transparent strokes
2D Painting Represents Large Space 1600 1142 2D Painting In Canvas
3D Voxel Canvas will be very limited 1000 1G Voxel In canvas
Deep Octree For Large Canvas With High Details 40km 2 26 × 2 26 × 2 26 = 302,231,454,903,657,293,676,544 (0.3𝑛𝑛) 3
Challenges • Painting in Large Canvas with High Detail • Deep Level Octree • Dynamic Tree on GPU • Expensive Refinement and Coarsening Cost • #10,000 ~ #100,000 nodes are generated or deleted with a stroke • Volume Rendering in VR Environment • Real-time Ray Casting • e.g. HTC Vive : Resolution 1680 × 1512 × 2 , 90 90fp fps~ • Accumulated error along the ray
Dynamic Octree Structure
Octree data must be kept on the GPU sid ide GPU-Only ly Octree v vs. s. CPU & GPU Octree • Nee eed CP CPU-GPU Transfer • No o CP CPU-CPU Transfer • CPU : CP : Mem emory Management • Scatter lim limited in in GLS LSL GPU : : Da Data for or Ren endering • Atomic for allo llocation • Ba Bala lancin ing is is not ot tri trivial
Octree Outline Tree cells • Strong 2-to-1 Balanced Tree [Kim15] • Simple Primal-only Tree • 1D array fields • Maximum Depth Level : 26 Update • Physical Unit : 0.3mm 3 ~ 40km 3 GPU Side Octree CPU Side Octree [Kim15] Byungmoon Kim, Panagiotis Tsiotras, Jeong-Mo Hong , and Oh-young Song, Interpolation and parallel adjustment of center-sampled trees with new balancing constraints
Octree Outline IDs IDs & Fl Flag RGBA 32bit X 3 INT 8bit X 4 UBYTE Tree cells • GPU Side Octree • 1D Array Fields(CPU) → 2D Texture(GPU) • Size of texture : 64M In Interpola lation Interpola In lation Table Ind Index • Allow Both Up & Down Traversal Update Weig ight Table le 16bit X 4 INT 16bit X 2 FLOAT CPU Side Octree GPU Side Octree
Tree Synchronization : CPU – GPU Transfer Coa oarsen Refin ine Tree cells Synchroniz ize ? Block : M x N Texels • Drawing a stroke causes local changes in space
One-level Refinement and Coarsening Frame 𝑢 0 0 0 1 1 0 1 2 1 0 : outside 1 : boundary 1 2 1 0 2 : inside 1 1 0 0
One-level Refinement and Coarsening Frame 𝑢 1 0 1 1 0 0 0 2 1 1 1 1 2 1 0 2 0 0 : outside 1 1 1 0 1 : boundary 1 0 2 1 2 0 2 : inside 0 1 1 0 0 1 1 1 0 0 0 0 1 0
One-level Refinement and Coarsening Frame 𝑢 𝑜 ≤ # 𝐍𝐛𝐲 𝐄𝐟𝐪𝐮𝐢
Update Tree to GPU Frame 𝑢 0 0 0 1 1 6 5 0 1 2 1 0 : outside 4 3 1 : boundary 2 1 2 1 0 2 : inside 1 0 1 1 0 0 Blo lock ID ID
Update Tree to GPU Frame 𝑢 1 0 1 1 0 0 0 2 1 1 1 6 5 1 2 1 0 2 0 0 : outside 4 1 1 1 0 3 1 : boundary 1 0 2 1 2 2 0 2 : inside 0 1 1 0 1 0 0 1 1 1 0 0 0 0 1 0 Blo lock ID ID
Update Tree on GPU Frame 𝑢 1 Frame 𝑢 0 Frame 𝑢 𝑜 1 0 1 0 0 0 0 0 1 1 1 1 1 2 Cell to be updated 1 2 0 1 2 0 0 1 2 1 1 1 1 0 0 2 1 1 2 0 1 2 1 0 1 1 0 1 2 1 0 0 0 1 1 0 0 1 0 0 Push bloc lock ID ID : : 0 Push bloc lock ID ID : : 1 Update Bloc lock Main ain Thread: Ordered Se Set Pop op bloc lock ID ID Update one block in every Frame blo block ID ID 0 1 2 7 3 11
Real-time Drawing
Volume Rendering
Volume Rendering? • Triangle Mesh Generation • Performance, Accuracy, Transparency Problem • Slicing • We tested octree texture interpolation : slow • Splatting • Splat should be bigger than cell : loss of resolution • Performance • Ray Casting
Ray Casting with Large Canvas Proble lem 3 Proble lem 1 Error increases along the ray Tree traversal from root to leaf at every sample points Proble lem 4 Proble lem 2 → Traverse up Rendering can be slow Useless sample points when cell ll is is empty ty at empty space
From Root to Leaf? P1 P1 P0 P0 Vis isit itin ing Ce Cell ll at t P1 Vis isit itin ing Ce Cell ll at t P0 Vis isit itin ing Ce Cell ll at t P0 Vis isit itin ing Ce Cell ll at t P1 • (6~24 neighbors) x (# cells) = ….?
From Root to Leaf? • Thanks to 2-to-1 balance tree, • A Cell always has 6 Neighbors 𝑶 𝟐 • 3 Neighbors share the parent ( = Their ID can be computed 𝑫 𝑶 𝟑 Given Cell using offset) 𝑶 𝟏 𝑶 𝟒 • 3 Neighbors have different parent Neighbors which shares the parent → If we precompute only 3-Neig ighbors, Neighbors which have different parent we can move to next xt neig ighbor dir irectly ly
Foveal Region
QuadTree Render Target 𝑋 4 × 𝐼 4 𝑋 2 × 𝐼 2 𝑋 × 𝐼
CanvoX Model Octree Oc GPU GPU Controlle Con ler Update 3-neighbor texture HMD HM /Haptic De Device • • View Render heat map Brush • Position Position Render Scene(Ray Casting) CPU CPU Main Th Mai Thread Update Scene Quad Tree Initialize Octree Oc Octree Parent ID Interpolate Scene Quad Tree Update View matrix & controller Pos. Child ID flag Render RGBA Upd pdate Bloc Block Temp0 Store Stroke Data Ordered Se Or Set Update Texture Block to GPU block ID St Stroke Da Data Stroke ID Pain aint Th Thread Segment ID Color and Mark cell One-level Refine/Coarsen marked cell
Implementation Detail HMD HTC Vive CPU Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz RAM 16 GB GPU NVIDIA GeForce 980Ti OS Window 10 64Bit Libraries OpenGL 4.3, OpenVR, Grizzly [Kim 15]
Summary • Dynamic and Simple Octree both on CPU and GPU • Shadow octree on GPU maintained by local changes • One-level refine/coarsen strategy • Real-time Ray Casting in Large Canvas • Fast tree traversal at samples using tree connectivity • Fast rendering using Quadtree-based Foveated Rendering • Minimize floating point error using local coordinates
Future Work • Performance Optimization • Adaptive 3D Interpolation • GPU-only Octree • Isosurface Rendering • More artistic tools
Low Precision Computation
Numbers
Numbers • New algebra with finite numbers someday? • This will be a breakthrough in math 10 10 • Until then, we should live with floating points 10 • Approximation to real field 10 • (a+b)+c ≈ a+(b+c) 10 • Extension to real field 10 • Inf, NaN 10 • We may probably establish an extension 10 • No Nonstandard An Analysis 10 • Fin Finit ite Ext xtended Ordered Fie Field ld? 10
Decreasing Precision 8bit CPU had 80bit registers to hold intermediate extended precision Do you use long double? Single precision SSE, AVX, … 24bit float is found in GPU half-float is common in GPU Mobile GPUs and highest end GP100 8bit representation is also common in GPU UNORM/SNORM
Knowing Numerical Error • Given x, floating point representation has error proportional to x • fl(x) = x(1+e), |e| <= 1.19 × 10 −7 • Numerical error: • fl(x+y) = (x+y)(1+e 1 ), |e 1 | <= 1.19 × 10 −7 • fl(fl(x+y) + z) = ((x+y)(1+e 1 ) + z)(1+e 2 ) = (x+y)(1+e 1 +e 2 +e 1 e 2 ) + z(1+e 2 ) = (x+y+z)(1+2e 3 ) , |e 3 | <= 1.19 × 10 −7 • fl( a / fl(fl(x+y) + z)) = a / (x+y+z)/(1+2e 3 ) (1+e 4 ) = a / (x+y+z)/(1+3e 5 ), |e 5 | <= 1.19 × 10 −7
Numerical Error In Volume Ray Casting • Cells are (much) smaller than floating point precision • We can use cell local coordinates, but we have to cross many cells • How much error we are getting along the ray? • If the error is large, then we may have entered to a wrong cell. This is hard to c orrect. • We should advance the ray such that the numerical error does not accumulate.
Numerical Error In Ray Casting 𝑢 p left-eye α p i p eye-center p 1 p right-eye
Error Does Not Increase By The Ray Length L
lf Precision: ε = 10 10 −3 Half Ha 1.19 × 10 ion: ε = 1.19 10 −7 Sin Single Precisio
Conclusion • We have chosen a scheme with a small enough error
Thank you Project Webpage : http://graphics.ewha.ac.kr/canvox/ Yeojin Kim, yeojinkim@ewhain.net Byungmoon Kim, bmkim@adobe.com Jiyang Kim, soarmin11@ewhain.net Young J. Kim, kimy@ewha.ac.kr This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2017R1A2B3012701)
Recommend
More recommend