high performance python c bindings with pypy and cling
play

High-performance Python-C++ bindings with PyPy and Cling Wim - PowerPoint PPT Presentation

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N High-performance Python-C++ bindings with PyPy and Cling Wim Lavrijsen (LBNL) and Aditi Dutta


  1. C O M P U T A T I O N A L R E S E A R C H D I V I S I O N High-performance Python-C++ bindings with PyPy and Cling Wim Lavrijsen (LBNL) and Aditi Dutta (Nanyang Tech) PyHPC 2016 6th Workshop on Python for High-Performance and Scientific Computing November 14, 2016, Salt Lake City, UT, USA

  2. Background: High Energy Physics ● High energy physics (HEP) – A.k.a. “particle physics”, explores matter, energy and the fundamental forces of nature – Often works on huge, long running experiments in large, geographically dispersed collaborations – The original “Big Data” ● Software development challenges – Range of different skill sets, preferences, interests – Large turnover of people over experiment life time – Run on everything, everywhere: grids, clusters, HPC systems, clouds, and @home High-performance Python-C++ bindings with PyPy and Cling 2

  3. ATLAS Detector High-performance Python-C++ bindings with PyPy and Cling 3

  4. Background: Python in HEP ● Historical time line of Python usage – 2001: first interest and implementations – 2004: gone mainstream – 2009: drives frameworks, job transforms, analyses – 2013: Nobel Prize in Physics (Higgs boson) – 2016: first-class citizen in new experiments ● Technology – C++ adopted in 1994, main language since ~1998 – Python bindings home-grown: piggy-backed on C++ reflection for serialization and interactivity (CINT) – Increased Python use thanks to Machine Learning High-performance Python-C++ bindings with PyPy and Cling 4

  5. H → ZZ → 2e2 μ High-performance Python-C++ bindings with PyPy and Cling 5

  6. Our Goals ● Support C++11 and beyond ● The scale and distribution to support large codes ● High performance (with PyPy) High-performance Python-C++ bindings with PyPy and Cling 6

  7. First Target: C++11 and beyond ● C++ language standardization went hyperdrive – Then: C++98 1 – Now: C++11, C++14, C++17, C++2x, ... ● We parse C++ headers for Reflection to – Automate I/O and schema evolution – Use C++ interactively from an interpreter – Provide automatic Python-C++ bindings ● Impossible to keep up with a small team ... CINT, a homegrown parser originated at HP, was replaced with Cling, which is an interactive C++ interpreter based on Clang (LLVM). Cling is developed by CERN. Our CPython-based Python bindings have followed suit. 1 With technical corrigendum in '03 High-performance Python-C++ bindings with PyPy and Cling 7

  8. Second Target: Scale and Distribution Problem Solution Fully automatic, interactive bindings, C++ developers, but Python users based on parsing C++ headers Lazy lookup/creation: pre-compiled Huge number of classes, functions, etc. modules and bindings only at run-time Lots of libraries and dependencies Automatic loaders with search paths Follow C++ (i.e. linker) structure to Name clashes, duplicates scope and uniquely identify names Reflection-based pythonizations Too much “C++ feel” (automatic) and regexp-based support for pythonizing common patterns Different Python versions: v2, v3, Only core bindings module (cppyy) CPython, pypy-c, ... depends on Python We leverage Python's and Cling's dynamic natures to maximize lazy evaluation, leading to shorter startup times and lower memory use. High-performance Python-C++ bindings with PyPy and Cling 8

  9. Third Target: High Performance ● Important for perception and decision making – Python is not slow for those who use it ● Part truth (heavy CPU loads in C++), part self-selection ● Improve performance completely transparently Python performance too fast enough borderline fast slow our target new tools, extensions, annotations, ll rewrites Note: this turns out to be a rather small, and changing, group of Python users! High-performance Python-C++ bindings with PyPy and Cling 9

  10. Technologies (Re-)Used ● Goal: maximize reuse of existing projects – Capture expertise, maintenance, future development ● Projects so leveraged: – Cling/ROOT (C++ interpreter https://root.cern/cling ) – Clang/LLVM (C++ compiler http://llvm.org ) – PyPy (Python w/ JIT http://pypy.org ) – CFFI (Python FFI to C https://cffi.readthedocs.io ) Lines of C++ Lines of (R)Python Not counting the CPython/cppyy ~18K ~1K ~1200 unit tests! PyPy/cppyy ~2K ~4K Note: we also build this work on an earlier, Refex-based, version of cppyy. With Cling, we get more functionality, improved ease-of-use, better performance, and C++1x. High-performance Python-C++ bindings with PyPy and Cling 10

  11. Architecture (example of function calls shown) 4: fnd C++ Python Cling headers 2: lazy lookup 5: parse 6: AST 7A: generate 1: module import 3: lazy lookup 7B: AST 13: Py result cppyy wrapper Clang code 10A: wrapper function ptrs 12: LL result 10B: direct 8: compile function ptrs 11: args & call ORCJit C++ Two paths: CFFI 7A-10A: wrappers (LLVM) libraries 7B-10B: direct FFI 9: link High-performance Python-C++ bindings with PyPy and Cling 11

  12. Functionality ● Both Python and Cling are interactive, allowing: – Automatic template instantiations ● Transparent unique_ptr<>, shared_ptr<>, etc. – std::vector<> optimizations – Offset calculations for multiple virtual inheritance – Cross-language derivation (both ways) – Etc.?! More ideas to explore ... ● C++1x much better at expressing ownership – Improved automatic memory management – Also targeted semi-manually with pythonizations ● E.g. name-based, custom smart pointers, etc. High-performance Python-C++ bindings with PyPy and Cling 12

  13. Optimizations ● PyPy JIT is conservative and optimizes Python – JIT hints needed to “teach it C++,” e.g.: ● Class hierarchies are fixed (and so are most offsets) ● Side-effect free functions are elidable ● Specialized paths (e.g. lookups, overloading, FFI) – Need micro-benches to debug & verify performance ● Hints are elementary, so scales to more complex codes ● Set of micro-benchmarks follows – Not feature-set exhaustive (yet) – Comparisons made with: ● Target: optimized C++ ● CPython/cppyy: the default of most of our users ● Swig: well-known, widely used High-performance Python-C++ bindings with PyPy and Cling 13

  14. Micro-benchmark: empty function call 0.2x Remaining overhead is GIL release/re-acquire (~20x). pypy-c pure inlines. High-performance Python-C++ bindings with PyPy and Cling 14

  15. Micro-benchmark: “complex” function call 1.5x Remaining overhead is GIL release/re-acquire (~3x). High-performance Python-C++ bindings with PyPy and Cling 15

  16. Micro-benchmark: overloaded function call 18x Swig tries methods in order, cppyy hashes successful calls. FFI suffers from GIL. High-performance Python-C++ bindings with PyPy and Cling 16

  17. Micro-benchmark: data member access 1.7x 4.2x SWIG creates Python properties in Python, CPython/cppyy in C++. High-performance Python-C++ bindings with PyPy and Cling 17

  18. Micro-benchmark: std::vector<int> 15x There's a frame left in FFI path; pypy-c pure uses array.array High-performance Python-C++ bindings with PyPy and Cling 18

  19. Realistic Code Creates values, applies some math, makes selections, store in histograms and ntuple format, write to disk. High-performance Python-C++ bindings with PyPy and Cling 19

  20. Caveats ● Non-JITed pypy-c is ~2x slower than CPython – Is code generation problem; don't expect fix ● PyPy uses a true garbage collector – C++ destructors called “randomly” ● Can call gc.collect() explicitly to force calls ● No guarantee that destructors will be called on exit – No true RAII possible ● PyPy JIT can be fickle – Inner loop branches take a long time to heat up – Minor code changes can cause performance drops High-performance Python-C++ bindings with PyPy and Cling 20

  21. Distribution ● Two modules and a pip for externals – cppyy in PyPy is builtin ● Currently on cling-support branch; on main soon – cppyy for CPython is extension module ● In most Linux distros, MacPorts, etc. (as part of ROOT) – Pip package with externals (for PyPy) to be released ● Licenses: – All open source, all very permissive High-performance Python-C++ bindings with PyPy and Cling 21

  22. Conclusions ● We developed Cling-based Python-C++ bindings – Supports C++1x and beyond – Supports large C++ codes – High performance with PyPy ● Combined interactive C++ with Python – New functionality and optimizations ● Showed 3x improvement for realistic code This work was supported by the ATLAS Collaboration, Google Summer of Code, and CERN SFT. High-performance Python-C++ bindings with PyPy and Cling 22

Recommend


More recommend