”It Just Works”: Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in ’Battlefield V’ Johannes Deligiannis Jan Schmid EA DICE
*PL ACEHOL DER* * PLAY GAMESCOM TRAILER OR SIMILAR *
TODAY we present Raytracing • Project background • GPU Raytracing Pipeline • Engine integration of DXR • GPU Performance
Battlefield V • FPS set in WWII • Released Nov 2018 • Raytracing work began Dec 2017 • First DXR game released!
Project Background • • ~10 months dev time Engineering • • Yasin Uludag (EA DICE) Use DXR in Battlefield V • • AO Johannes Deligiannis (EA DICE) • • GI Jiho Choi (NVIDIA) • • Shadows Pawel Kozlowski (NVIDIA) • Reflections • And a bunch of other people! ☺
Main Challanges • • Not a Tech Demo Early adopter tax • • Content is set API not final • Driver hang/bugs • Game in full production • BSoD • Scope of Engine changes • No capture tool (Nsight, Pix) • Performance • But we shipped it ☺ • Denoising vs Ray Count • No RTX cards
10
11 (simple) raytracing pipeline Intersect/Material Generate Rays Light Rays Light Combine Data
12 Generate Rays Lookup Texture G Buffer *Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “ Stochasti hastic c Screen en-Space Space Reflect ection ions ”
13 Raytracing MAGIC
14 Light Rays float4 light(MaterialData surfaceInfo, float3 rayDir) { foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light); foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light); foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light); … }
15 Light Combine Lookup Texture Lit Raster result
16 unhappy Rays Contribute Less Bad bad bad, very sad crying faces Sloooow Very Noisy
17 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data
18 Variable Rate Tracing 0 0 128 128 128 256 256 128 0 0 128 128 Classify Max Ratio 0 .1 .1 0 0 .5 .5 0 .1 .2 .2 .1 .5 1 1 .5 Normalize 0 .1 .1 0 0 .5 .5 0
19 Variable Rate Tracing 256 rays 128 rays 64 rays 32 rays
20 Variable Rate Tracing Success! - More Rays on Water - More Rays on grazing angles
21 Problem
22 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning
23 Ray Binning 3 012 Screen Offset Bin Index Angle
24 Ray Binning
25 Ray Binning Local Offsets Rays Atomic Increment Ray 1000 0 Bin 3011 Ray 1001 0 Bin 3013 Ray 1002 1 Bin 3011 2 0 1 Bin 3011 Bin 3012 Bin 3013
26 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Local Offsets 0 0 1 Exclusive Parallel Sum * 2 0 1 Bin 3011 Bin 3012 Bin 3013 *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”
27 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Local Offsets Rays Lookup Add 0 Ray 1000 0 Ray 1002 Add Add 1 Ray 1001
28 Problem
29 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning
30 SS SS-Hybridization Miss Rejected [Stachowiak et al 15] "Stochastic Intersect/Material Hierarchical Screen Rays Give Up Screen-Space Reflections" Data Space Trace Material Data Material Data Radiance Light Material
31 SS SS-Hybridization
32 SS SS-Hybridization
33 Problem Busy Idle Busy Idle Hit Miss Hit Miss Idle Busy Idle Miss Hit Miss Busy Hit Busy Busy Idle Miss Hit Hit Miss Idle Miss Miss Hit Hit Idle Idle Busy Busy Light Shader Wavefront Raytrace
34 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning Defrag
35 Defrag 1 0 1 0 0 1 1 0 1 0 1 0 Exclusive Parallel Sum * 0 1 1 2 2 2 3 4 4 5 5 6 Hit Hit Hit Hit Hit Hit *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”
36 Problem Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Light Shader 2.0ms
37 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Per Cell Light List Generate Rays Light Rays Light Combine Data Lighting Ray Binning Defrag
38 Per Cell Light Lists Light 2 Light 3 Next Next Light 0 Light 1 Light 3 Next Next Next
39 Problem
40 Improving raytracing pipeline SSR Variable Rate Denoise Hybridization Tracing Intersect/Material Per Cell Light List Generate Rays Light Combine Data Lighting Ray Binning Defrag
41 Denoising Reuse Reuse Spatial [Stachowiak et al 15] "Stochastic Temporal Information Screen-Space Reflections" Information Temporal BRDF Filter Filter
42 BRDF Denoise Filter 𝑀 𝑗 𝑚 𝑙 𝑔 𝑡 𝑚 𝑙 → 𝑤 cos Θ 𝑚 𝑙 𝑂 σ 𝑙=1 𝑞 𝑙 Kernel Size???? 𝑀 0 ≈ 𝐺𝐻 𝑔 𝑡 𝑚 𝑙 → 𝑤 cos Θ 𝑚 𝑙 𝑂 σ 𝑙=1 𝑞 𝑙
43 BRDF Denoise Filter ?????
44 BRDF Denoise Filter
45 BRDF Denoise Filter Frame N -1 Frame N
46 BRDF Denoise Filter Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad Actual: up to 13 Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Actual: 16 Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad
47 BRDF Denoise Filter
48 Temporal Denoise Filter Is it a good sample? If only... BRDF Denoiser!
49 temporal Denoise Filter Still Noisy
50 Image Denoise Filter Generate LUT { angle, roughness } to { width, height } for unit length ray
51 Image Denoise Filter ∗ =
52 Image Denoise Filter 1 1 ∗ = 2 2
53 Image Denoise Filter
54 New Pipeline Screen Variable Generate Intersect/ Ray Binning Space Rate Tracing Rays Material Data Hybrid 0.37ms 0.19ms 0.15ms 0.36ms 1.98ms
55 New Pipeline 6.29ms total Intersect/ ‘Improved’ Temporal Defrag Spatial Filter Image Filter Material Data Lighting Filter 0.46ms 1.45ms 0.24ms 1.00ms 1.98ms 0.08ms
56
D XR – a.k .a ” BLAC K BO X” No DXR Intersection Shading
DXR b asi cs A D D C B A • BLAS - Bottom Level BLAS Acceleration Structure 𝑑 1,1 ⋯ 𝑑 4,1 ⋮ ⋱ ⋮ 𝑐 1,1 ⋯ 𝑐 4,1 𝑏 1,4 ⋯ 𝑑 4,4 • 𝑏 1,1 ⋯ 𝑏 4,1 ⋮ ⋱ ⋮ TLAS - Top Level x ⋮ ⋱ ⋮ 𝑏 1,4 ⋯ 𝑐 4,4 𝑏 1,4 ⋯ 𝑏 4,4 Acceleration Structure 𝑒 1,1 ⋯ 𝑒 4,1 • CS x ⋮ ⋱ ⋮ 𝑒 1,4 ⋯ 𝑒 4,4 • Skinning, Destruction TLAS • Compute shader • Update each frame A A • Blas can update incrementally CS D D C B A
ACCEL ERATI ON STRUCTURE • Which objects? • Frustum Culling • Occlusion Culling • Easy... no culling!
Accel erati on structure – F I RST P A SS • Rotterdam • 20200 TLAS instances... • 5000 BLAS rebuilds... • GPU rebuild 64 ms (!)
W hat to do? • Idea: Reduce instance count • Use a culling heuristic • Accept (some) minor artifacts
Cul l i ng HEURI STI C • Assumtion: • Far away objects not important • Except for large objects • Bridge, building etc • Need some kind of measurement...
Cul l i ng • Project bounding sphere 𝑠 • 𝜄 = 𝑢𝑏𝑜 𝑠 𝑒 • If 𝜄° < 𝑈ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 ° : Cull 𝑒 𝜄
cul l i ng 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜 𝜄 = 4° 𝜄 = 1 5°
cul l i ng Culled Objects 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜 𝜄 = 4°
CUL L I NG - RESUL TS • 4 deg culling • 5000 -> 400 BLAS rebuilds each frame • 20000 -> 2800 TLAS instances • TLAS + BLAS build (GPU): 64 ms -> 14.5 ms • Pros • Faster • Cons • Occasional popping • Missing objects
B l as update opti mi zati ons • Still expensive! More ideas: 1. Stagger full and incremental BLAS rebuild • N frames incremental before full rebuild 2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD 3. Avoid redundant rebuilds • Check CS input (bone matrix) • 400 -> 50 • Overlap BLAS update with GFX • Gbuffer, shadowmaps 77
resul ts • TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms • RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags) • Much better ☺ 78
SH AD IN G ( OPAQU E) RT ON | SHADING OFF RT ON | SHADING ON
Recommend
More recommend