visionworks
play

VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks VisionWorks Software Stack AGENDA VisionWorks Programming Model


  1. April 4-7, 2016 | Silicon Valley VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016

  2. Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack AGENDA VisionWorks™ Programming Model Conclusion Demo 2

  3. COMPUTER VISION Intelligent Video Analytics Autonomous Driving Robotics Drones Augmented Reality 3

  4. COMPUTER VISION 4

  5. COMPUTER VISION APP DEVELOPMENT Product Port to target & optimize Reference Implementation Concept 5

  6. VISIONWORKS ™ MOTIVATION Deliver high performance, robust computer vision primitives Depth Map Ease development of computer vision applications on Tegra platforms Optical Flow Accelerate prototype to product cycle Corner detection 6

  7. VISIONWORKS ™ AT A GLANCE CUDA accelerated library (OpenVX primitives + NVIDIA extensions + Plus Algorithms) Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV Thread-safe API Documentation, tutorials, sample software pipelines that teach use of primitives and framework 7

  8. VISIONWORKS ™ SUPPORTED PLATFORMS Automotive Embedded Desktop Drive PX JETSON TX1 Ubuntu Linux 14.04, Windows 8 JETSON TK1 JETSON TK1 Pro  Drive PX2 8

  9. VISIONWORKS™ TOOLKIT SOFTWARE STACK VisionWorks VisionWorks VisionWorks-Plus . . . Object Tracker SfM NVXIO VisionWorks Source Samples Source Samples Multimedia Feature Tracking, Hough Transform, Stereo Depth Extraction, Camera Hist Equalization.. Abstraction NVIDIA VisionWorks VisionWorks Core Framework & Primitive Extensions Library VisionWorks OpenVX TM Framework & Primitives CUDA API Khronos NVIDIA CUDA Acceleration Framework 9

  10. VISIONWORKS ™ PRIMITIVES IMAGE ARITHMETIC Stereo Block Matching Median Filter IME Create Motion Field Scharr3x3 Absolute Difference IME Refine Motion Field Sobel 3x3 Accumulate Image IME Partition Motion Field All OpenVX Accumulate Squared FEATURES Accumulate Weighted Primitives GEOMETRIC Add/ Subtract/ Multiply + Canny Edge Detector Channel Combine TRANSFORMS FAST Corners + Channel Extract FAST Track Affine Warp + Color Convert + Harris Corners + Warp Perspective + CopyImage Harris Track Flip Image Convert Depth Hough Circles Remap Magnitude Hough Lines Scale Image + MultiplyByScalar Not / Or / And / Xor ANALYSIS FILTERS Phase Histogram NVIDIA BoxFilter Table Lookup Histogram Equalization Extensions Convolution Threshold Integral Image Dilation Filter Mean Std Deviation Erosion Filter FLOW & DEPTH Min Max Locations Gaussian Filter Median Flow Gaussian Pyramid + type/mode extension by NVIDIA Optical Flow (LK) + Laplacian3x3 Semi-Global Matching NVIDIA extension primitives 10

  11. VISIONWORKS ™ PRIMITIVES • VisionWorks primitives are CUDA optimized All OpenVX (except MedianFlow & FindHomography extensions) Primitives • 85% of VisionWorks OpenVX API is also accelerated with NEON. Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref. (Go to "VisionWorks API" -> "NVIDIA Extensions API" - > "Vision Primitives API” ) • Primitive acceleration with VisionWorks • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x) NVIDIA Extensions • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x) (Measured on Drive PX, OS =‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’) 11

  12. VISIONWORKS ™ SAMPLE APPLICATIONS Stereo Depth OpenCV-NPP- Hough Lines & Feature Tracker OpenVX Interop Extraction Circles + Video stabilization + Iterative Motion Estimation/Flow and other platform specific samples (available only on certain platforms) Camera Capture, OpenGL interop, Video playback 12

  13. VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION Camera input Interop/EGLStre Interop/EGLStre ams ams GFX CSI ISP & Camera Render Processing Vision processing Video/image file input CUDA Image/Video Image/Video . . . Decode Encode CPU COMPLEX Streamed NVXIO GPU (Multi-core video/image ARM v8) input AUDIO SECURITY VIDEO VIDEO 2D ENGINE ENGINE ENGINE ENCODER DECODER (VIC) (APE) SAFETY SAFETY BOOT PROC CAN PROC IMAGE ENGINE MANAGER (BPMP) (SPE) PROC (ISP) (SCE) (HSM) 13 I/O

  14. VISIONWORKS™ PLUS ALGORITHMS Object Tracker Structure From Motion 14

  15. Programming with VisionWorks Library 15

  16. VISIONWORKS ™ PROGRAMMING MODEL VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Heterogeneous compute Direct CUDA API for Standard specified API with graph advanced CUDA heterogeneous optimizations developers compute API with  individual function Extensible with user calls defined nodes 16

  17. VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE OpenVX Immediate mode API enables developers to easily port their applications. OpenVX API Immediate mode calls are prefixed with “ vxu ” Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode. OpenCV image Feature Source detection Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames Image Pyramid 17

  18. VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions) 1.4x OpenCV image Feature Source detection 0.6x 4.9x 2.3x 4.6x Cv::Mat to Processs pts Color Optical Warp Vx_image & Find Conversion Flow Perspective Homography Stabilized frames 1.7x Image Pyramid 18

  19. VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE OpenVX API graph mode calls are prefixed with “ vx ” OpenVX Graph enables advanced optimizations Buffer reuse, kernel fusion • Efficient use of streaming and CUDA textures • Automatic scheduling across processing units based on various factors (safety, perf,..) • Tiling and pipelining vision functions at sub-frame level • Feature detection Processs pts Image Color Optical Warp & Find Source Conversion Flow Perspective Homography Stabilized frames Image Pyramid 19

  20. VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE Performance boost: Video stabilization application is further accelerated compared to immediate mode. Feature detection Processs pts Color Image Optical Warp & Find Conversion Source Flow Perspective Homography Stabilized frames Image Pyramid 20

  21. VISIONWORKS CUDA API FEATURE TRACKING SAMPLE VisionWorks CUDA API enables developer with low-level access. Developer manages • Data allocations and transfer Scheduling and pipelining • Camera/image/video YUV Gray Rendering/Output Input data frame frame nvxcuColor nvxcuChannel nvxcuGaussian nvxcuOptica nvxcuHarris Convert Extract lFlowPyrLK Track Pyramid Array of RGB frame (CUDA buffer) keypoints 21

  22. VISIONWORKS™ API SELECTION VisionWorks VisionWorks VisionWorks OpenVX™ OpenVX™ CUDA API Immediate Mode Graph Mode Let the graph manager to Quick port from other Low level CUDA API hide overheads, optimize libraries access for advanced and manage data  CUDA developers  To be able to reassign To be able to reassign CPU CPU and GPU tasks based and GPU tasks based on on perf. perf. 22

  23. DEBUGGING WITH VISIONWORKS ™ Enable VisionWorks debug markers with “export NVX_PROF=nvtx ” 23

  24. VISIONWORKS ™ DOCUMENTATION Installed location: /usr/share/visionWorks/docs 24

  25. VISIONWORKS ™ FACTS First Khronos OpenVX™ 1.0 compliant library (Jan 2015) VisionWorks enables key demos (CES’16 and more at GTC) 27K downloads (embedded) since release in Nov, 2015 + Installed by default on all automotive platforms Weekly VisionWorks downloads for various platforms 25

  26. CONCLUSION VisionWorks Toolkit delivers multiple levels of API • – OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API • Heterogeneous API enables switching from GPU to CPU – this is very powerful, reducing productization time Delivers high performance • Offers significant speedup over CUDA optimized OpenCV functions – • Adopts native media APIs on Tegra platforms and delivers ready to use code samples H6115 - Designing S6739 - VisionWorks™ L6129 -VisionWorks ™ Computer Vision Toolkit Programming Toolkit LAB Session Applications with Tutorial VisionWorks™ – Pod B Room LL20A Room 210C 26

  27. RESOURCES & USEFUL LINKS http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials 27

  28. FULLY CONVOLUTIONAL NETWORK [1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015. VISIONWORKS WITH DEEP LEARNING DEMO 28

More recommend