ian ozsvald
play

Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim - PowerPoint PPT Presentation

Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim Chief Data Scientist 19+ years experience Edition! Team coaching & public courses d n 2 Im sharing


  1. Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald – ianozsvald.com

  2. Introductions  Interim Chief Data Scientist  19+ years experience Edition!  Team coaching & public courses d n 2 – I’m sharing from my Higher Performance Python course Ian Ozsvald By [ian]@ianozsvald[.com]

  3. Thank the organisers!  All volunteers – go say thank you in #lobby  They’ve put in a huge amount of volunteered work for us! Ian Ozsvald By [ian]@ianozsvald[.com]

  4. Today’s goal  Pandas – Saving RAM to fjt in more data – Calculating faster by dropping to Numpy  Advice for “being highly performant”  Has Covid 19 afgected UK Company Registrations? Ian Ozsvald By [ian]@ianozsvald[.com]

  5. Strings are expensive and slow Ian Ozsvald By [ian]@ianozsvald[.com]

  6. Categoricals are cheap and fast! Circa 1% of previous memory cost Ian Ozsvald By [ian]@ianozsvald[.com]

  7. Categoricals “.cat” accessor Ian Ozsvald By [ian]@ianozsvald[.com]

  8. Categoricals – over 10x speed up (on this data)! Ian Ozsvald By [ian]@ianozsvald[.com]

  9. Categoricals – index queries faster! Circa 500x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

  10. fmoat64 is default and a bit expensive Ian Ozsvald By [ian]@ianozsvald[.com]

  11. fmoat32 “half-price” and a bit faster Ian Ozsvald By [ian]@ianozsvald[.com]

  12. Make choices to save RAM Including the index (previously we ignored it) we still save circa 50% RAM so you can fjt in more rows of data Ian Ozsvald By [ian]@ianozsvald[.com]

  13. “dtype_diet” gives you advice Ian Ozsvald By [ian]@ianozsvald[.com]

  14. Drop to NumPy if you know you can Caveat – Pandas mean is not np mean, the fair comparison is to np nanmean which is slower – see my blog or PyDataAmsterdam 2020 talk for details Ian Ozsvald By [ian]@ianozsvald[.com]

  15. NumPy vs Pandas overhead (ser.sum()) Thanks! 25 fjles, 83 functions Very few NumPy calls! Ian Ozsvald By [ian]@ianozsvald[.com]

  16. Overhead... Ian Ozsvald By [ian]@ianozsvald[.com]

  17. Overhead with ser.values.sum() 18 fjles, 51 functions Many fewer Pandas calls (but still a lot!) Ian Ozsvald By [ian]@ianozsvald[.com]

  18. Is Pandas unnecessarily slow – NO! https://github.com/pandas-dev/pandas/issues/34773 - the truth is a bit complicated! Ian Ozsvald By [ian]@ianozsvald[.com]

  19. Being highly performant  Install optional (but great!) Pandas dependencies – bottleneck https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html – numexpr  Investigate https://github.com/ianozsvald/dtype_diet  Investigate my ipython_memory_usage (PyPI/Conda) Ian Ozsvald By [ian]@ianozsvald[.com]

  20. Pure Python is “slow” and expressive Deliberately poor function – pretend this is clever but slow! Ian Ozsvald By [ian]@ianozsvald[.com]

  21. Compile to Numba judiciously Near 10x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

  22. Parallelise with Dask for multi-core  Make plain-Python code multi-core  Note I had to drop text index column due to speed-hit  Data copy cost can overwhelm any benefjts so (always) profjle & time Ian Ozsvald By [ian]@ianozsvald[.com]

  23. Being highly performant  Mistakes slow us down (PAY ATTENTION!) – Try nullable Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Lots more material & my newsletter on my blog IanOzsvald.com – Time saving docs: Ian Ozsvald By [ian]@ianozsvald[.com]

  24. Vaex / Modin  Memory mapped & lazy computation – New string dtype (RAM efgicient)  Modin sits on Pandas, new “algebra” for dfs – Drop in replacement, easy to try See talks on my blog: Ian Ozsvald By [ian]@ianozsvald[.com]

  25. Summary  Make it right then make it fast  Think about being performant  See blog for my classes  I’d love a postcard if you learned something new! Ian Ozsvald By [ian]@ianozsvald[.com]

  26. Covid 19’s efgect on UK Economy? Sharp decline in corporate registration after Lockdown – then apparent surge (perhaps just backed-up paperwork?). Will the recovery “last”? All open data , you can do similar things! Ian Ozsvald By [ian]@ianozsvald[.com]

More recommend