Methodology for mapping image processing algorithms on massively - PowerPoint PPT Presentation

Methodology for mapping image processing algorithms on massively parallel processors An NVIDIA GPU specific approach Florian Gouin Corinne Ancourt firstname.name@mines-paristech.fr MINES ParisTech – PSL Research University, Paris Centre de Recherche en Informatique 22/06/2017 French community of compilation – 12 th meeting – Saint Germain au Mont d’Or

Context and motivation Application case Mapping methodology Experiments Conclusion Motivation Image processing domain Figure: Image processing examples 2/28

Context and motivation Application case Mapping methodology Experiments Conclusion Motivation Image processing domain General tendances for today and tomorrow: Data source volume is growing exponentially Data sources tend to be multiplied Available computing time tends to be shorter for real time processing Image processing algorithms are even more complex 3/28

Context and motivation Application case Mapping methodology Experiments Conclusion Motivation Architectural evolution Figure: Processor frequency wall Instructions SISD MISD Data SIMD MIMD SIMT Figure: Flynn’s taxonomy Figure: NVIDIA Kepler processor – 192 cores architecture 4/28

Context and motivation Application case Mapping methodology Experiments Conclusion Motivation Why do we need a methodology? Parallel thinking is not trivial. The following methodology has been elaborated to provide: an assistance for GPU developpers, an improvement of software production for industries, a support for other domain engineers, an assistance to optimise software for a specific GPU architecture. Tools and compilers results can be limited in some cases: dynamic control code intensive function calls pointers arithmetic object oriented languages ... 5/28

Context and motivation Application case Mapping methodology Experiments Conclusion Content Application case 1 6/28

Context and motivation Application case Mapping methodology Experiments Conclusion Content Application case 1 Mapping methodology 2 6/28

Context and motivation Application case Mapping methodology Experiments Conclusion Content Application case 1 Mapping methodology 2 Experiments 3 6/28

Context and motivation Application case Mapping methodology Experiments Conclusion Content Application case 1 Mapping methodology 2 Experiments 3 Conclusion 4 6/28

Context and motivation Application case Mapping methodology Experiments Conclusion Optical Flow algorithms Optical Flow: definition Principle: Examples of applications: Motion quantification of each Motion estimation pixel taken from two distinct Image stabilization pictures. Image segmentation Image processing application: Moving object tracking spatial characterization SLAM algorithms temporal characterization ... 7/28

Context and motivation Application case Mapping methodology Experiments Conclusion Optical Flow algorithms Optical Flow: industrial application example Figure: Example of motion flow analysis. Tesla Motor Company automatic drive. 8/28

Context and motivation Application case Mapping methodology Experiments Conclusion SimpleFlow algorithm Algorithm data The SimpleFlow 1 algorithm is available in the OpenCV extensions. Approximatively 600 lines of code Sequential algorithm Dynamic control code Approximative runtime for a couple of 2 million pixels images: 200s on a NVIDIA Jetson TX1 ARM Cortex A57(1.9GHz) + A53(1.3GHz) 50s on a desktop computer Intel Core I7 4770S (8 logical cores at 3.1GHz) Ideal runtime: 40ms Language and library: C++ with the OpenCV library 1 Michael W. Tao et al. “SimpleFlow: A Non-iterative, Sublinear Optical Flow Algorithm”. In: Computer Graphics Forum (Eurographics 2012) 31.2 (May 2012). url : http://graphics.berkeley.edu/papers/Tao-SAN-2012-05/ . 9/28

Context and motivation Application case Mapping methodology Experiments Conclusion SimpleFlow algorithm Simplified CallGraph calcOpticalFlowSF selectPointsToRecalcFlow upscaleOpticalFlow buildPyramidWithResizeMethod ones GaussianBlur mixChannels removeOcclusions calcIrregularityMat calcConfidence extrapolateFlow calcOpticalFlowSingleScaleSF crossBilateralFilter resize zeros max dist min cvRound extrapolateValueInRect multiply wd wc split sum copyMakeBorder exp Figure: Simplified call graph. function is simpleflow one, function is openCV one and function comes from the C++ std library 10/28

Context and motivation Application case Mapping methodology Experiments Conclusion SimpleFlow algorithm Application example Figure: Image 1 ( t ) Figure: Image 2 ( t + δ ) Figure: X coordinate pixel motions Figure: Y coordinate pixel motions 11/28

Context and motivation Application case Mapping methodology Experiments Conclusion Overview - Macroscopic scale source code code analyses code analyses loop nest transformations for SIMT architectures loop optimisations GPU specialisation GPU mapping CPU+GPU source code 12/28

Context and motivation Application case Mapping methodology Experiments Conclusion Code analyses source code code analyses code analyses loop nest transformations for SIMT architectures loop optimisations GPU specialisation GPU mapping CPU+GPU source code 13/28

Context and motivation Application case Mapping methodology Experiments Conclusion Code analyses application executable file source code Function call Loop Detection Array detection Branch detection detection profiling compilation Block identification Loop iteration Array accesses analysis analysis Dependance analysis Global Function Loop Loop mining runtime runtime runtime parallel sequential loops loops 14/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures source code code analyses code analyses loop nest transformations for SIMT architectures loop optimisations GPU specialisation GPU mapping CPU+GPU source code 15/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop identification 16/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop pattern GPU loop identification // 1 ≤ # b ≤ 3 b 0 GPU loop pattern b 1 // b l o c k b 2 // s t 0 // or ↓ 0 ≤ # t ≤ 3 t 1 // or ↓ t h r e a d t 2 // or ↓ s 16/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop pattern GPU loop size GPU loop identification  = b b 0 × b 1 × b 2 b 0 < 2147483647  // 1 ≤ # b ≤ 3 b 1 < 65535 b 0 b 2 < 65535  GPU loop pattern b t b 1 // ≫ b l o c k b 2 // s  t = t 0 × t 1 × t 2 t 0 // or ↓ t < 1024  0 ≤ # t ≤ 3 GPU loop size  t 0 < 1024  t 1 // or ↓ t 1 < 1024 t h t 2 < 64 r e a  d t 2 // or ↓ t %32 = 0  s  t > 4 × 32 16/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop pattern GPU loop size GPU loop identification  = b b 0 × b 1 × b 2 b 0 < 2147483647  // 1 ≤ # b ≤ 3 b 1 < 65535 b 0 b 2 < 65535  GPU loop pattern b t b 1 // ≫ b l o c k b 2 // s  t = t 0 × t 1 × t 2 t 0 // or ↓ t < 1024  0 ≤ # t ≤ 3 GPU loop size  t 0 < 1024  t 1 // or ↓ t 1 < 1024 t h t 2 < 64 r e a  d t 2 // or ↓ t %32 = 0  s  t > 4 × 32 GPU memory size GPU memory size Global memory footprint < GPU memory 16/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop pattern GPU loop size GPU loop identification  = b b 0 × b 1 × b 2 b 0 < 2147483647  // 1 ≤ # b ≤ 3 b 1 < 65535 b 0 b 2 < 65535  GPU loop pattern b t b 1 // ≫ b l o c k b 2 // s  t = t 0 × t 1 × t 2 t 0 // or ↓ t < 1024  0 ≤ # t ≤ 3 GPU loop size  t 0 < 1024  t 1 // or ↓ t 1 < 1024 t h t 2 < 64 r e a  d t 2 // or ↓ t %32 = 0  s  t > 4 × 32 GPU memory size GPU memory size Global memory footprint < GPU memory GPU loop nests 16/28

Context and motivation Application case Mapping methodology Experiments Conclusion loop nest transformations for SIMT architectures parallel sequential loops loops GPU loop identification Strip Parallel mining reduction Fusion Fission Tiling InterchangeSplitting Coalescing X X X X X X GPU loop pattern GPU loop size X X X X X X GPU memory size GPU loop nests 16/28

Methodology for mapping image processing algorithms on massively - PowerPoint PPT Presentation

Methodology for mapping image processing algorithms on massively parallel processors An NVIDIA GPU specific approach Florian Gouin Corinne Ancourt firstname.name@mines-paristech.fr MINES ParisTech PSL Research University, Paris Centre de

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

Image Processing CS 110 Why Image Processing? Medical Images

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Color image processing The use of color in image processing is primarily motivated by two Image

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

Image Transforma1ons image filtering : change range of image Image Processing : g(x) =

David Tschumperl Image Team, GREYC / CNRS (UMR 6072) IPOL Workshop on Image Processing

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

Computer Graphics (543) Lecture 7 (Part 2): Texturing Prof Emmanuel Agu Computer Science Dept.

Fixing WTFs - Detecting Image Matches caused by Watermarks, Timestamps, and Frames in Internet

Reading Required Angel, 7.6-7.8. Recommended Paul S. Heckbert. Survey of texture

Computing the image of Thurstons skinning map David Dumas Richard Kent University of

Final exam effects Final exam effects Lighting Neon lights Textures / refelction

Labs #2 WebDev Web ebDe Dev v Lab #1 On your local Ubuntu VM, install the Python and C

Introduction to Dialectometry III Wilbert Heeringa German Academic Exchange Service DAAD

Large-Scale Video Retrieval Using Image Queries Andr Filgueiras de Araujo Department of