Algorithmen für die Echtzeitgrafik Algorithmen für die Echtzeitgrafik Temporal Coherence Daniel Scherzer scherzer@cg.tuwien.ac.at LBI Virtual Archeology 1 Syllabus Image Space 1. Introduction 2. Image space 1. Theory: Image-space reverse reprojection 2. Applications 3. Object space 3 4 Object Space Temporal Coherence Introduction 5
What is Temporal Coherence Objectives of Using Temporal Coherence � Information that stays valid for multiple queries � Speed up � Min 60 FPS in RTR → high temporal coherence � Increase in quality � Reducing temporal aliasing 7 8 Objectives of Using Temporal Coherence Objectives of Using Temporal Coherence � Speed up: distribute workload over several frames � Increase in quality � Incorporate calculations from previous frames 9 10 Objectives of Using Temporal Coherence Objectives of Using Temporal Coherence � Reducing temporal aliasing (flickering) � Avoid sudden changes in coherent regions 11 12
Conclusion � Idea of temporal coherence (TC) � Next: Temporal Coherence � Image-Space Real-Time Reverse Reprojection Image-Space Real-Time Reverse Reprojection quality stability speed 13 Outline Outline � Image-space spatio-temporal data structure � Image-space spatio-temporal data structure � Reverse reprojection cache � Reverse reprojection cache � Implementation � Implementation � Determining what to reuse � Determining what to reuse � Analysis � Analysis 15 16 Image space shading cache Reprojection 17 18
Reprojection Reprojection 19 20 Outline Reprojection � Image-space spatio-temporal data structure � No exact 1-to-1 pixel mapping (bijection) exists � Reverse reprojection cache Forward reprojection Reverse reprojection � Implementation � Determining what to reuse Frame n-1 � Analysis (cache) Frame n 21 22 Forward reprojection Reverse reprojection [Nehab 06/07, Scherzer 07] � Reprojection operator ( x ′ , y ′ , z ′ ) = π t-1 ( p ) � Requires forward motion vectors � Cache: f t -1 , cache depth: d t -1 � Holes and gaps need filtering with depth culling � Test if z ′ ≈ d t -1 ( x ′ , y ′ ) for occlusion � Difficult to implement with DX9/10 level hardware π � � � � � � � � � � π � � �� � � � � � � � � � � � � � π � π � � � � � � �� Image courtesy of Bruce Walter cache ( f t -1 ) new frame ( f t ) cache ( f t -1 ) new frame ( f t ) 23 24
Case study: Pixel shader acceleration Case study: Pixel shader acceleration � Today: pixel shader consume large portion of render � Regular rendering loop budget � Recompute every pixel using the original pixel shader � Reuse expensive computation results � Reverse reprojection cache (RRC) [Nehab 06, 07] 25 26 Case study: Pixel shader acceleration Outline � Reuse previous results using the RRC � Image-space spatio-temporal data structure � Reshade on demand � Reverse reprojection cache � Cache reuse path must be cheaper � Implementation � Computing cache coordinate / cache miss � Cache resampling Load/Reuse Load/Reuse Load/Reuse yes � Refreshing strategies Lookup Hit? Update � Control flow Lookup Lookup Hit? Hit? Update Update � Determining what to reuse no Recompute Recompute Recompute � Analysis 27 28 Determining cache coordinate Analogy: shadow map (first pass) Light Cache (Frame t-1) Frame t Shadow map π t-1 ( p ) � Render scene from light-view and save depth values p Slide courtesy of Diego Nehab 29 30
Analogy: shadow map (second pass) Determining cache coordinates Light Eye � Shader code Eye-view Shadow map Projection space position for t -1 � Render scene from light-view and save depth values Viewport transform. � Render scene from eye-view No need to flip y in OpenGL � Transform each fragment to light source space � Compare z eye with z light value stored in shadow map 31 32 Detecting cache misses Detecting cache misses � Depth as an ID � Bilinear Z interpolation for smooth surface � Depth is non-linear but approximate Frame t Cache (Frame t-1) � Discontinuity edge: discard � Z separating threshold ε > depth buffer accuracy π t-1 ( p ). z � FP complementary Z buffer [Akeley and Su 2006] desired depth z z desired/ interpolated d t-1 ( p ) interpolated depth depth d t-1 ( p ) > π t-1 ( p ). z (miss) d t-1 ( p ) < π t-1 ( p ). z (miss) d t-1 ( p ) ≈ π t-1 ( p ). z (hit) d t-1 ( p ) > π t-1 ( p ). z (miss) d t-1 ( p ) < π t-1 ( p ). z (miss) d t-1 ( p ) ≈ π t-1 ( p ). z (hit) x x 33 34 Detecting cache misses Detecting cache misses � Intersecting object have � Viewport clipping similar depths � Either: invalidate the texture fetch outside the boundary � Use object ID as an additional (e.g. Use D3D10_TEXTURE_ADDRESS_BORDER) � Or: explicitly test ID � Final shader fragment 35 36
Cache resampling and filtering Cache resampling and filtering � No 1-to-1 pixel mapping � Nearest (point) resampling � Common resampling: Nearest, Bilinear � Texture shift and distortion � Fractional pixel velocity: � = � � � �� � � � � =(0.5, 0.5) � � …… Frame t -3 Frame t -2 Frame t -1 Frame t 37 38 Cache resampling and filtering Cache resampling and filtering � Bilinear resampling � Bicubic resampling � Blur, acceptable < 10 frames � Less blur � 16 texture fetches can be reduced to 4 [GPU Gems 2, Ch. 20] 39 40 Cache resampling and filtering Cache resampling and filtering � Minification and magnification � Minification: � Generate a mip chain, read appropriate mip level Frame t-1 � Magnification Minification � Estimate error reprojected pixel size and position Frame t (pixels become smaller at t ) � Force cache miss when reprojected pixel size does not cover any pixel center Magnification (pixels become larger at t ) 41 42
Cache resampling and filtering Refreshing strategies � Magnification � Source of error � Shader code � Resampling error � Shading signal change � Refresh pixels in round-robin fashion � Divide pixels equally into n groups � Each pixel has a group ID: i ∈ [0, n -1] � Refresh when ( t + i ) mod n = 0 � Current frame count: t 43 44 Refreshing strategies Refreshing strategies � Tiled refresh � Random block refresh granularity � Block size at least 2 x 2 for GPU efficiency refresh 0 1 � Dynamically change n per pixel ■ cached 2 3 ■ � Random block refresh 0 7 5 3 1 2 6 4 1x1 2x2 4x4 5 3 1 0 6 4 2 7 Miss Refresh Reuse n = 10 ■ ■ ■ 45 46 Control flow Control flow � Single-pass implementation � Two-pass implementation � Rely on GPU dynamic flow control (DFC) � First pass: execute cache hit route � Unbalanced branching causes performance loss � Second pass: execute cache miss route � Blocks of pixels get penalized by one cache miss � Early-Z culling detects unprocessed pixels second pass first pass Recompute Recompute Recompute cache payload cache payload cache payload Fetch cache Fetch cache Fetch cache Original Fetch cache Fetch cache Compute shading Compute shading Fetch cache Compute shading payload payload payload yes payload payload payload using payload using payload using payload shader yes Compute shading Compute shading Compute shading Compute shading Compute shading Update cache, Update cache, Compute shading Update cache, Lookup Lookup Lookup Hit? Hit? Hit? using payload using payload using payload using payload using payload using payload Output color Output color Output color Lookup Lookup Lookup Hit? Hit? Hit? no Recompute Recompute Recompute no Update cache, Update cache, Update cache, Update cache, Update cache, Update cache, cache payload cache payload cache payload Discard pixel Discard pixel Discard pixel Output color Output color Output color Output color Output color Output color No z-buffer change 47 48
Recommend
More recommend