April 4-7, 2016 | Silicon Valley OPENGL BLUEPRINT RENDERING Christoph Kubisch, 4/7/2016
MOTIVATION Blueprints / drawings in CAD/graph viewer applications Documents can contain many LINES and LINE_STRIPS Various line styles can be used (world-space widths, stippling, joints, caps...) Potential CPU bottlenecks Generating geometry for complex styles Collecting and rendering geometry Model courtesy of PTC 2
MOTIVATION Not targeting full vector graphics NV_path_rendering covers high fidelity vector graphics rendering Per-pixel quadratic Bézier evaluation Stencil & Cover pass to allow sophisticated blending Focus of this talk is rendering lines defined by traditional vertices Rendering data from OpenGL buffer objects Single-pass, but does mean not safe for blending (does self-overlap) 3
DEMO: BASIC DEMONSTRATION 4
LINE RASTERIZATION Representation Standard: skewed rectangle pixel snapped lines Multisampling: aligned rectangle smooth lines Both suffer from visible gaps and overlaps on increasing line width 5
LINE RASTERIZATION Stippling Stippling only in screenspace Patterns must be expressable with 16 bits LINES re-start pattern every segment LINE_STRIPS have continous distance 6
SHADER-DRIVEN LINES Appearance TECH on screen Create TRIANGLES/QUADS Geometry in world for line segments coordinates Project extruded vertices to keep line width consistent Shapes via fragment Clip and color in fragment shader discard shader based on UV coordinates and line distance 7
SHADER-DRIVEN LINES TECH FLEXIBILITY Create TRIANGLES for line Arbitrary stippling segments, project patterns and line widths extrusion to world/screen, discard Joint- and cap-styles fragments Different distance metrics New coloring/animation possibilities via shaders Thin center line as effect 8
SHADER-DRIVEN LINES TECH FLEXIBILITY CAVEATS Create TRIANGLES for line Arbitrary stippling Cannot be as fast as basic segments, project patterns and line widths line rasterization extrusion to world/screen, discard Joint- and cap-styles Not all data local at fragments rendering time (line strip distances need extra Different distance metrics calculation) New coloring/animation Geometry still self- possibilities via shaders overlaps 9
SHADER-DRIVEN LINES Sample implementation/library C interface library to render different line primitives (LINES, LINE_STRIPS, ARCS) provided as flexible framework rather than black-box Two different render-modes: render as extruded triangles, or one pixel wide lines Uses NVIDIA and ARB OpenGL extensions if available 10
SHADER-DRIVEN LINES Sample implementation/library Style- Stipple- Global style and stipple definitions Definitions Patterns Stipple from arbitrary bit-pattern, or Style 0 Pattern texture A float values Style 1 Pattern texture B ... ... typedef enum NVLSpaceType_e { typedef enum NVLCapsType_e { typedef struct NVLStyleInfo_s { NVL_SPACE_SCREEN, NVL_CAPS_NONE, NVLSpaceType projectionSpace; NVL_SPACE_SCREENDIST3D, NVL_CAPS_ROUND, NVLJoinType join; NVL_SPACE_CUSTOM, NVL_CAPS_BOX, NVLCapsType capsBegin; NVL_SPACE_CUSTOMDIST3D, NVL_NUM_CAPS, NVLCapsType capsEnd; NVL_NUM_SPACES, }NVLCapsType; float thickness; }NVLSpaceType; NVLStippleID stipplePattern; float stippleLength; typedef enum NVLAnchorType_e { typedef enum NVLJoinType_e { float stippleOffsetBegin; NVL_ANCHOR_BEGIN, NVL_JOIN_NONE, float stippleOffsetEnd; NVL_ANCHOR_END, NVL_JOIN_ROUND, NVLAnchorType stippleAnchor; NVL_ANCHOR_BOTH, NVL_JOIN_MITER, NVLboolean stippleClamp; NVL_NUM_ANCHORS, NVL_NUM_JOINS, } NVLStyleInfo; }NVLAnchorType; }NVLJoinType; 11
SHADER-DRIVEN LINES Sample implementation/library Uses GPU friendly collection mechanism: Geometry/Raw Recording Record many primitives then render Optionally render sub-sections Geometry Primitives VBO reference Raw Primitives pass vertex data directly Raw Primitives Vertex values Geometry Primitives reference existing Matrix Color Vertex Buffers Style reference Collections have usage-style flags: filled new per-frame recorded once, re-used many frames 12
SHADER-DRIVEN LINES Quad extrusion GS VS Faster geometry creation by just using Vertex- Shader, avoiding extra Geometry-Shader stage Render GL_QUADS (4 vertices each segment) VertexBuffer Use gl_VertexID to fetch line points texelFetch(...gl_VertexID/4 + 0 or 1) Use it for the offsets as well Using custom vertex-fetch generally not recommended, but useful for special situations gl_VertexID % 4 + 0 gl_VertexID % 4 + 1 13
SHADER-DRIVEN LINES Minimize Overdraw No naive rectangles but adjacency in LINE_STRIP is used to tighten the geometry Reduces overdraw and minimizes potential artifacts resulting from that 14
SHADER-DRIVEN LINES Depth clamping Joints and caps exceed original line definition Can cause depth-buffer artifacts Prevent depth over-shooting by passing closest depth to fragment shader and #extension GL_ARB_conservative_depth : require clamp there layout (depth_greater) out float gl_FragDepth; Can use ARB_conservative_depth or just in flat float closestPointDepth; ... min/max to keep hardware z-cull active gl_FragDepth = max(gl_FragCoord.z, closestPointDepth); 15
DISTANCE COMPUTATION V 0 V 1 LINE_STRIPS need dedicated calculation phase V 2 Read vertices and calculate distances along the strip V 3 VertexBuffer V Sections drawn indepedently Fetch vertices & distances Strip Length 4 0 1 2 3 D 0 D 1 D 1 D 2 DistanceBuffer D 0 [0,1] [0,1]+[1,2] [0,1]+[1,2]+[2,3] D 2 D 3 Distances are fetched at render-time 16
DISTANCE COMPUTATION Shader Tips Thread: 0 ... 3 One LINE_STRIP per thread can lead to under utilization and non ideal memory 3 2 4 8 Strip Length access due to divergence VertexBuffer SIMT hardware processes threads together in lock-step, common instruction pointer Distance 0 3 5 9 (masks out inactive threads). Accumulation 1 4 6 10 NVIDIA: 1 warp = 32 threads Loop 2 - 7 11 - - 8 12 - - - 13 - - - 14 - - - 15 - - - 16 17
DISTANCE COMPUTATION Shader Tips Thread: 0 ... 3 Compute one LINE_STRIP at a time across warp, gives nice memory fetch 9 9 9 9 Strip Length NV_shader_thread_shuffle to access VertexBuffer neighbors and do prefix-sum calculation Distance vec3 posA = getPosition ( gl_ThreadInWarpNV + …) Accumulation 0 1 2 3 vec3 posB = shuffleUpNV (posA, 1, gl_WarpSizeNV); Loop ... Handle first thread point differently float dist = distance(posA, posB); Access neighbor point via [0,0] [0,1] [1,2] [2,3] shuffleUpNV and Short strips may still under-utilize warp, compute distance ... Prefix-sum over distances ... but are taking only one iteration 4 5 6 7 8 - - - 18
DISTANCE COMPUTATION Batching & Latency hiding Effective Warp 0 Warp 1 Warp 2 Warp 3 Utilization Memory intensive operations prefer many Fetch threads to hide latency of fetch Wait For Would not „compute“ distance for a single Memory strip, but need many strips to work on Compute Use one warp per strip if total amount of threads is low Hardware switches activity between entire warps 19
DISTANCE COMPUTATION Batching & Latency hiding Launch overhead of compute dispatch not ... “Compute” alternative for few threads if (numThreads < FEW_THREADS){ negligable for < 10 000 threads glUseProgram( vs ); glEnable ( GL_RASTERIZER_DISCARD ); glDrawArrays( GL_POINTS, 0, numThreads ); Use glEnable(GL_RASTERIZER_DISCARD); and glDisable ( GL_RASTERIZER_DISCARD ); } Vertex-Shader to do compute work else { glUseProgram( cs ); numGroups = (numThreads+GroupSize-1)/GroupSize; No shared memory but warp data sharing as glUniformi1 (0, numThreads); glDispatchCompute ( numGroups, 1, 1 ); seen before (ARB_shader_ballot or } NV_shader_thread_shuffle) ... Shader #if USE_COMPUTE layout (local_size_x=GROUP_SIZE) in; layout (location=0) uniform int numThreads; int threadID = int( gl_GlobalInvocationID.x ); #else int threadID = int( gl_VertexID ); #endif 20
SMOOTH TRANSITIONS Anti-aliasing edges within shader Fragment shader effects cause outlines of visible No geometric edges No MSAA benefit shapes to be within geometry MSAA will not add quality „within triangle“ Need to compute coverage accurately (sample- shading) or approximate Use of gl_SampleID (e.g. with interpolateAtSample) automatically makes in float stippleCoord; ... shader run per- sample, „discard“ will affect coverage mask properly sc = interpolateAtSample (stippleCoord, gl_SampleID); stippleResult = computeStippling( sc ); if (stippleResult < 0) discard; Cheaper: GL_SAMPLE_ALPHA_TO_COVERAGE or clear bits in gl_SampleMask 21
SMOOTH TRANSITIONS Using Pixel Derivatives 1 1 Simple trick to get smooth transitions, also works 0 0 well on surface contour lines Use a signed distance field, instead of step fwidth 1 ( signal ) function smoothing zone around zero 0 Find if sample is close to transition (zero crossing) via fwidth signal -1 Compute smooth weight if required within smoothing zone float weight = signal < 0 ? -1 : 1; float zone = fwidth ( signal ) * 0.5; if (abs (signal) < zone){ weight = signal / zone; } 22
Recommend
More recommend