s9391 gstcuda easy gstreamer and cuda integration
play

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel - PowerPoint PPT Presentation

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grner GTC March 2019 About RidgeRun GStreamer Overview CUDA Overview GstCUDA Introduction Agenda Application Examples Performance Statistics GstCUDA


  1. S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grüner GTC March 2019

  2. About RidgeRun GStreamer Overview CUDA Overview GstCUDA Introduction Agenda Application Examples Performance Statistics GstCUDA Demo on TX2 Q&A 2

  3. About Us ● US Company - R&D Lab in Costa Rica ● 15 years of experience ● Embedded Linux and GStreamer experts ● Custom multimedia solutions ● Digital signal/image processing ● AI and Machine Learning solutions ● System optimization: CUDA, GStreamer, OpenCL, OpenGL, OpenVX, Vulkan ● Support for embedded and resource constrained systems ● Professional services, dedicated teams and specialized tools 3

  4. Medical Industry Automotive Industry Smart Devices Computer Vision ● Complex multimedia applications require a lot of processing resources ● GStreamer offers a flexible way for creating multimedia applications ● CUDA offers high performance accelerated processing capabilities 4

  5. ● Open source framework for audio and video applications ● Based on a pipeline architecture ● Extensible design based on plugins (more than 1000 freely available) ● Automatic format and synchronization handling ● Tools for easy prototyping Modularity Portability Flexibility 5

  6. Basic MP4 player GStreamer Pipeline ● Each plugin represents a different processing module ● The plugins are linked and arranged in a pipeline ● Freedom to build arbitrary pipelines for different applications 6

  7. Modular design lets you change your application easily! Easily change from SW to Easily change your HW accelerated processing application end use 7

  8. Modular design lets you change your application easily! Code equivalent : gst-launch v4l2src ! videoconverter ! x265enc ! mpegtsmux ! filesink Code equivalent : gst-launch v4l2src ! videoconverter ! omxh265enc ! mpegtsmux ! udpsink 8

  9. 9

  10. GstCUDA 10

  11. GstCUDA 11

  12. What Does GstCUDA Solve? 12

  13. Integration Complexities ● ● ● 13

  14. Development Time Create GStreamer plugin with CUDA support Generate CUDA algorithm Integrate CUDA algorithm Without 5 3 Months 10 days Total = 3.5 months days GstCUDA Generate CUDA algorithm Integrate CUDA algorithm ● Reduce development time With 0.1 ● Focus on the CUDA logic 10 days Total = 10.1 days GstCUDA day ● Minimize time to market 14

  15. Performance Bottleneck Memcpy Memcpy ● ● ● 15

  16. Performance Bottleneck Without GstCUDA With GstCUDA ● Data transfers bottleneck ● Efficient memory handling cause poor performance improves performance ● Limited framerate at high ● Up to 2x 4K@60fps resolutions 16

  17. Supported Platforms ● Focused for NVIDIA Embedded Platforms Jetson TX1, TX2, TX2i and Jetson AGX Xavier Nano 17

  18. GstCUDA Key Features 18

  19. GstCUDA Key Features 19

  20. Framework Overview 20

  21. Quick Prototyping Elements 21

  22. Cudafilter Element location = median_filter.so 22

  23. Cudamux Element IR location = thermal_overlay.so 23

  24. CUDA Algorithm Interface ● Make your CUDA algorithm compatible by implementing these interfaces Cudafilter Interface Cudamux Interface bool open(); bool open(); bool close(); bool close(); bool process (const GstCudaData &inbuf, bool process (vector<GstCudaData> GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); bool process_ip (const GstCudaData bool process_ip (vector<GstCudaData> &inbuf, GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); 24

  25. Buffer Processing Methods process_ip (In place) process (Not in place) 25

  26. Create Your Custom Element ● Some applications may require specialized elements ● GstCUDA provides bases classes to simplify development • • 26

  27. GstCUDA Framework Usage Example ● 27

  28. GstCUDA Framework Summary ● The framework includes: Quick prototyping GstCUDA API Set of examples elements ● Utils to handle ● Generic elements to ● Complete GstCUDA memory interfaces evaluate custom element boilerplate algorithms ● GStreamer Unified ● CUDA algorithms for Memory allocators ● Runtime loading of the prototyping CUDA algorithms elements ● Parent classes for different topologies 28

  29. GstCUDA Application Areas Examples Video 29

  30. Industrial Applications: Border Enhancement 30

  31. Automation Applications: Hough Transform 31

  32. Security Applications: Motion Detection/Estimation 32

  33. Performance Statistics 33

  34. Varying Algorithm / Fixed Image Size Test Conditions ● Image convolution algorithm location = convolution.so ● Stressing compute capabilities ● Variable convolution kernel size ● 1080p@240fps / 1080p@60fps stream input ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● Not In-place 34

  35. Varying Algorithm / Fixed Image Size Framerate Stats 35

  36. Varying Algorithm / Fixed Image Size Processing Time Stats 36

  37. Varying Algorithm / Fixed Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 37

  38. Fixed Algorithm / Varying Image Size Test Conditions ● Memory copy algorithm location = memcpy.so ● Stressing data transfer ● Variable input resolution ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● In-place vrs not In-place 38

  39. Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 245 fps by the video source 39

  40. Fixed Algorithm / Varying Image Size Processing Time Stats 40

  41. Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 41

  42. Fixed Algorithm / Varying Image Size Test Conditions ● Simple image mixing algorithm location = mixer.so ● Stressing data transfer ● Variable input resolution ● Cudamux element ● Unified Memory allocator ● In-place=True ● Jetson TX2 platform 42

  43. Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 240fps by the video source 43

  44. Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 44

  45. GstCUDA Live Demo on Jetson TX2 Sobel Filter 1080p60fps Code equivalent : gst-launch-1.0 nvcamerasrc sensor-id=2 fpsRange=60,60 ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=6 0/1,format=I420" ! nvvidconv ! "video/x-raw" ! queue ! cudafilter in-place=false location=/borders.so ! queue ! nvoverlaysink 45

  46. Resources ● GstCUDA wiki page: ○ gstcuda.ridgerun.com ● RidgeRun Website: ○ ridgerun.com ● RidgeRun Contact: ○ ridgerun.com/contact 46

Recommend


More recommend