S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grüner GTC March 2019
About RidgeRun GStreamer Overview CUDA Overview GstCUDA Introduction Agenda Application Examples Performance Statistics GstCUDA Demo on TX2 Q&A 2
About Us ● US Company - R&D Lab in Costa Rica ● 15 years of experience ● Embedded Linux and GStreamer experts ● Custom multimedia solutions ● Digital signal/image processing ● AI and Machine Learning solutions ● System optimization: CUDA, GStreamer, OpenCL, OpenGL, OpenVX, Vulkan ● Support for embedded and resource constrained systems ● Professional services, dedicated teams and specialized tools 3
Medical Industry Automotive Industry Smart Devices Computer Vision ● Complex multimedia applications require a lot of processing resources ● GStreamer offers a flexible way for creating multimedia applications ● CUDA offers high performance accelerated processing capabilities 4
● Open source framework for audio and video applications ● Based on a pipeline architecture ● Extensible design based on plugins (more than 1000 freely available) ● Automatic format and synchronization handling ● Tools for easy prototyping Modularity Portability Flexibility 5
Basic MP4 player GStreamer Pipeline ● Each plugin represents a different processing module ● The plugins are linked and arranged in a pipeline ● Freedom to build arbitrary pipelines for different applications 6
Modular design lets you change your application easily! Easily change from SW to Easily change your HW accelerated processing application end use 7
Modular design lets you change your application easily! Code equivalent : gst-launch v4l2src ! videoconverter ! x265enc ! mpegtsmux ! filesink Code equivalent : gst-launch v4l2src ! videoconverter ! omxh265enc ! mpegtsmux ! udpsink 8
9
GstCUDA 10
GstCUDA 11
What Does GstCUDA Solve? 12
Integration Complexities ● ● ● 13
Development Time Create GStreamer plugin with CUDA support Generate CUDA algorithm Integrate CUDA algorithm Without 5 3 Months 10 days Total = 3.5 months days GstCUDA Generate CUDA algorithm Integrate CUDA algorithm ● Reduce development time With 0.1 ● Focus on the CUDA logic 10 days Total = 10.1 days GstCUDA day ● Minimize time to market 14
Performance Bottleneck Memcpy Memcpy ● ● ● 15
Performance Bottleneck Without GstCUDA With GstCUDA ● Data transfers bottleneck ● Efficient memory handling cause poor performance improves performance ● Limited framerate at high ● Up to 2x 4K@60fps resolutions 16
Supported Platforms ● Focused for NVIDIA Embedded Platforms Jetson TX1, TX2, TX2i and Jetson AGX Xavier Nano 17
GstCUDA Key Features 18
GstCUDA Key Features 19
Framework Overview 20
Quick Prototyping Elements 21
Cudafilter Element location = median_filter.so 22
Cudamux Element IR location = thermal_overlay.so 23
CUDA Algorithm Interface ● Make your CUDA algorithm compatible by implementing these interfaces Cudafilter Interface Cudamux Interface bool open(); bool open(); bool close(); bool close(); bool process (const GstCudaData &inbuf, bool process (vector<GstCudaData> GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); bool process_ip (const GstCudaData bool process_ip (vector<GstCudaData> &inbuf, GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); 24
Buffer Processing Methods process_ip (In place) process (Not in place) 25
Create Your Custom Element ● Some applications may require specialized elements ● GstCUDA provides bases classes to simplify development • • 26
GstCUDA Framework Usage Example ● 27
GstCUDA Framework Summary ● The framework includes: Quick prototyping GstCUDA API Set of examples elements ● Utils to handle ● Generic elements to ● Complete GstCUDA memory interfaces evaluate custom element boilerplate algorithms ● GStreamer Unified ● CUDA algorithms for Memory allocators ● Runtime loading of the prototyping CUDA algorithms elements ● Parent classes for different topologies 28
GstCUDA Application Areas Examples Video 29
Industrial Applications: Border Enhancement 30
Automation Applications: Hough Transform 31
Security Applications: Motion Detection/Estimation 32
Performance Statistics 33
Varying Algorithm / Fixed Image Size Test Conditions ● Image convolution algorithm location = convolution.so ● Stressing compute capabilities ● Variable convolution kernel size ● 1080p@240fps / 1080p@60fps stream input ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● Not In-place 34
Varying Algorithm / Fixed Image Size Framerate Stats 35
Varying Algorithm / Fixed Image Size Processing Time Stats 36
Varying Algorithm / Fixed Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 37
Fixed Algorithm / Varying Image Size Test Conditions ● Memory copy algorithm location = memcpy.so ● Stressing data transfer ● Variable input resolution ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● In-place vrs not In-place 38
Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 245 fps by the video source 39
Fixed Algorithm / Varying Image Size Processing Time Stats 40
Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 41
Fixed Algorithm / Varying Image Size Test Conditions ● Simple image mixing algorithm location = mixer.so ● Stressing data transfer ● Variable input resolution ● Cudamux element ● Unified Memory allocator ● In-place=True ● Jetson TX2 platform 42
Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 240fps by the video source 43
Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 44
GstCUDA Live Demo on Jetson TX2 Sobel Filter 1080p60fps Code equivalent : gst-launch-1.0 nvcamerasrc sensor-id=2 fpsRange=60,60 ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=6 0/1,format=I420" ! nvvidconv ! "video/x-raw" ! queue ! cudafilter in-place=false location=/borders.so ! queue ! nvoverlaysink 45
Resources ● GstCUDA wiki page: ○ gstcuda.ridgerun.com ● RidgeRun Website: ○ ridgerun.com ● RidgeRun Contact: ○ ridgerun.com/contact 46
Recommend
More recommend