Network, Storage, and Workflow Design for GPU Centric Film Production (or how we make the ******* chimichangas)
Simplified WorkFlOw Camera Dailies Editorial Color Grading / Deliverables
Deadpool Armory Open Drives Velocity 36 TB Array Open Drives Exos (SSD Disk hybrid) 216 TB Mellanox SX1012 5x Mac Pro 2013 3x HP Z840 64 GB Memory Intel V3 2687 W 5x Nvidia M6000 Solarflare 5622
Example Facility DiAgram
Interesting Facts • 1. The entire opening sequence for Deadpool was look dev’d and pitched using Vray RT renders created using M6000’s. • 2. With an Nvidia M6000 and relatively unlimited Bandwidth. The Entire offline of the Deadpool film can be rendered in 12 minutes. • 3. If 6 editors, all hit play at the same time in Adobe Premiere, to fully load the GPU cache (for real-time effects absorption). Total bandwidth draw can reach up to a peak 4 GB/s even with offline style codecs.
The Big Problem • GPU Speed has resulted in unpredictable artist environments. With Extremely spikey IO patterns. • At any given moment Editors, Compositors, and now 3D Lighting artists have the capability to completely saturate their link to central storage. Or create an IOPs storm. • This creates unacceptable lag, reducing many of the benefits of GPU centered workflow. • GPU applications are now extremely affected by lack of throughput, but also latency.
Methods For Dealing Choose a low latency Network Card 1 MB NFS Call return response time. 10 Gb. 1. Solarflare 1.03 ms 2. Intel X540 1.15 ms 3. Atto 1.9 ms 4. Promise 2.4 ms In other words a card that costs the same price, can easily double your scene load times. Or if the applications is poorly written, we have seen up to 5X reduction in certain latency specific tasks.
Storage Side • Intense predictive memory caching. We can achieve speeds up to 40 GB/s with latency mesaured in microseconds out of our highest caching tier. Currently limited by network interconnects. • Block level mechanism, particularly effective with wavlet or mip map textures, we load what the engine needs. • Deep low latency analytics, allow us to control Bandwidth and IOPS to prevent a system lock scenario and a consistent level of service to all clients.
Storage Side • Currently Open Drives has optimization profiles for Adobe Premiere, Nuke, Vray, Davinci, and Baselight. • These profiles have let us by way of example load a 47,000 object project in under 75 seconds. A roughly 4x increase over last years project Gone Girl. • Further optimization on Deadpool let us Average an access pattern with over an over 96 % cache hit ratio. Out of the layer that we can deliver up to 1.1 Million 4 KB read iops.
Switch Side • L2 LACP Binding is your friend. • For video workflow, look at switch providers that can maintain low latency. Also segregate completely your production workflow network from your internet gigabit network. • If you’re dealing with 4K or larger image sequences, go to Jumbo frames. Client Interupt scheduling can be painful. RDMA methods on most client OS’s are still immature.
Thank YOU
Recommend
More recommend