experiences with building domain specifjc compilation
play

Experiences with Building Domain-Specifjc Compilation Plugins in - PowerPoint PPT Presentation

Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Lujn Except where otherwise noted, this presentation is licensed under


  1. Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang’17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.

  2. 1 Introduction 1 / 16 Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  3. 1 Introduction 1 / 16 Is there a way to create domain-specifjc compiler optimizations without having to learn the whole compilation stack? Yes! Modular JIT compilers (e.g. Graal) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  4. 1 Introduction 2 / 16 Introduction (e.g. autonomous vehicles, virtual reality) Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Computer vision applications becoming mainstream ■ Both on embedded and desktop environments ■ Ongoing efgort to: □ Increase accuracy □ Optimize performance

  5. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  6. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  7. 1 Introduction 3 / 16 Background S imultaneous L ocalization A nd M apping (SLAM) Applications Input Stream of frames from cameras moving in an unknown environment Output Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ 3D reconstruction of environment ■ Cameras’ location in the environment ■ Absolute positions of objects in the environment

  8. 2 LSD-SLAM 4 / 16 Our case L arge- S cale D irect monocular SLAM (LSD-SLAM) Pose-graph A graph where: and the corresponding covariance matrix from the previous frame Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs ■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

  9. 2 LSD-SLAM 4 / 16 Our case L arge- S cale D irect monocular SLAM (LSD-SLAM) Pose-graph A graph where: and the corresponding covariance matrix from the previous frame Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Monocular: uses a single camera for input ■ Non feature-based, operates on image densities ■ Uses pose-graphs ■ nodes are frames ■ directed edges contain the transformations (rotation, scaling, and translation)

  10. 2 LSD-SLAM 5 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 LSD-SLAM overview Depth Estimation using pose and matched pixels frame Sim(3) pose key-frame-z Tracking key-frame-x create SE(3) pose from frames key-frame-y key-frame-x Map Optimization minimize error in Sim(3) poses

  11. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  12. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  13. 2 LSD-SLAM 77.411 mean (ns) mean (ns) Eigen (C++) 13.342 131.138 9.847 152.376 EJML (Java) 415.924 mean (ns) 84.479 308.412 JEigen (JNI) 1356.498 1671.105 58.961 895.845 Manlang’17, 28 Sep 2017 mean (ns) L-M Update 6 / Gradient Inter. 16 LSD-SLAM breakdown F. Zakkak - foivos.zakkak@manchester.ac.uk Point Trans. SE(3) Log. Framework Map Optimisation (3.3%) Tracking (40.7%) Depth Estimation (49.4%) misc. Pose Arithmetic (18.4%) includes SE(3) Logarithm Point Levenberg-Marquardt Update Gradient Interpolation Transform misc. (40%) (27%) (13%)

  14. 2 LSD-SLAM 7 / 16 Performance Characterization JIT compiler generated code worse than the hand-tuned Eigen Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ JIT compiler fails to inline some methods in the critical path ■ Opportunities for constant folding and sub-expression elimination are missed ■ No SIMD

  15. 3 Indigo 8 / Manlang’17, 28 Sep 2017 elimination Encapsulated and immutable F. Zakkak - foivos.zakkak@manchester.ac.uk Accompanied by a Graal plugin A small vector and matrix library Indigo: Our Approach 16 ■ Reduces object allocation ■ Up to 8 elements and 8x8 cells ■ Reduces memory indirection ■ Enables constant folding ■ Enhances common sub-expression ■ Force inline methods of the library ■ Custom register allocation ■ SIMD backend

  16. 3 Indigo 9 / 16 Why a new backend and register allocator? 2. The JVM does not support SIMD registers 3. The JVM cannot handle SIMD registers during register spillage Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 1. There is no publicly accessible SIMD assembler in Graal

  17. 3 Indigo 10 / 16 Indigo: Assumptions for SIMD acceleration suitable for vector operations in SLAM Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Hardware supports 128-bit vector operations ■ Indigo’s classes/subclasses contain single-precision fmoating point numbers ■ Unused elements of a vector are zero ■ The elements of a vector are contiguous in memory ■ Once constructed, a vector is immutable

  18. 3 Indigo 11 / 16 Indigo Compilation Plugin Outline Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk

  19. 4 Evaluation 8 Manlang’17, 28 Sep 2017 Apache CML 3.6 Baseline Java SE 1.8.0_72 64-Bit JVMCI VM JVM MSVC 17.00.61030 (x64) C++ compiler Windows 8.1 OS Software SSE 4.2 and AVX2 Vector Units 16GB Main memory Hardware threads 12 / 4 Cores Intel Core i7 4770 3.4GHz Processor Hardware Evaluation Setup as a SLAM specifjc library 2. Indigo vs Eigen matrices Java library as a generic small vectors and 1. Indigo vs Apache CML Comparison Methodology 16 F. Zakkak - foivos.zakkak@manchester.ac.uk

  20. 4 Evaluation 13 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk Indigo vs Apache CML: Vector Operations 16 Indigo Indigo-S IMD 10 9 Speedup (vs Apache CML) 8 7 6 5 4 3 2 1 0 Addition Cross S calar Dot Hamilton S calar S ubtraction Product Division Product Product Multiplication

  21. 4 Evaluation 14 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 Indigo vs Apache CML: Matrix Operations Indigo Indigo-S IMD 70 Speedup (vs Apache CML) 60 50 40 30 20 10 0 Addition S calar S calar Vector Matrix S ubtraction Division Multiplication Multiplication Multiplication

  22. 4 Evaluation 15 / Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk 16 Indigo vs Eigen: SLAM kernels Indigo (w/o Graal extensions) Indigo-S IMD 3 2.5 Speedup (vs Eigen) 2 1.5 1 0.5 0 Point Transform S E(3) Logarithm Gradient Interpolation L-M Update

  23. 5 Conclusions 16 / 16 Conclusions domain-specifjc applications signifjcantly optimized using this approach Manlang’17, 28 Sep 2017 F. Zakkak - foivos.zakkak@manchester.ac.uk ■ Domain-specifjc optimizations have signifjcant impact on the performance of ■ Modular JIT compilers like Graal ease such optimizations through plugins ■ Indigo demonstrates that SLAM applications written in Java can be

  24. Experiences with Building Domain-Specifjc Compilation Plugins in Graal ManLang’17 , 28 Sep 2017 Colin Barrett Christos Kotselidis Foivos S. Zakkak Nikos Foutris Mikel Luján Thank You!

Recommend


More recommend