Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim - PowerPoint PPT Presentation

Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald – ianozsvald.com

Introductions  Interim Chief Data Scientist  19+ years experience Edition!  Team coaching & public courses d n 2 – I’m sharing from my Higher Performance Python course Ian Ozsvald By [ian]@ianozsvald[.com]

Thank the organisers!  All volunteers – go say thank you in #lobby  They’ve put in a huge amount of volunteered work for us! Ian Ozsvald By [ian]@ianozsvald[.com]

Today’s goal  Pandas – Saving RAM to fjt in more data – Calculating faster by dropping to Numpy  Advice for “being highly performant”  Has Covid 19 afgected UK Company Registrations? Ian Ozsvald By [ian]@ianozsvald[.com]

Strings are expensive and slow Ian Ozsvald By [ian]@ianozsvald[.com]

Categoricals are cheap and fast! Circa 1% of previous memory cost Ian Ozsvald By [ian]@ianozsvald[.com]

Categoricals “.cat” accessor Ian Ozsvald By [ian]@ianozsvald[.com]

Categoricals – over 10x speed up (on this data)! Ian Ozsvald By [ian]@ianozsvald[.com]

Categoricals – index queries faster! Circa 500x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

fmoat64 is default and a bit expensive Ian Ozsvald By [ian]@ianozsvald[.com]

fmoat32 “half-price” and a bit faster Ian Ozsvald By [ian]@ianozsvald[.com]

Make choices to save RAM Including the index (previously we ignored it) we still save circa 50% RAM so you can fjt in more rows of data Ian Ozsvald By [ian]@ianozsvald[.com]

“dtype_diet” gives you advice Ian Ozsvald By [ian]@ianozsvald[.com]

Drop to NumPy if you know you can Caveat – Pandas mean is not np mean, the fair comparison is to np nanmean which is slower – see my blog or PyDataAmsterdam 2020 talk for details Ian Ozsvald By [ian]@ianozsvald[.com]

NumPy vs Pandas overhead (ser.sum()) Thanks! 25 fjles, 83 functions Very few NumPy calls! Ian Ozsvald By [ian]@ianozsvald[.com]

Overhead... Ian Ozsvald By [ian]@ianozsvald[.com]

Overhead with ser.values.sum() 18 fjles, 51 functions Many fewer Pandas calls (but still a lot!) Ian Ozsvald By [ian]@ianozsvald[.com]

Is Pandas unnecessarily slow – NO! https://github.com/pandas-dev/pandas/issues/34773 - the truth is a bit complicated! Ian Ozsvald By [ian]@ianozsvald[.com]

Being highly performant  Install optional (but great!) Pandas dependencies – bottleneck https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html – numexpr  Investigate https://github.com/ianozsvald/dtype_diet  Investigate my ipython_memory_usage (PyPI/Conda) Ian Ozsvald By [ian]@ianozsvald[.com]

Pure Python is “slow” and expressive Deliberately poor function – pretend this is clever but slow! Ian Ozsvald By [ian]@ianozsvald[.com]

Compile to Numba judiciously Near 10x speed-up! Ian Ozsvald By [ian]@ianozsvald[.com]

Parallelise with Dask for multi-core  Make plain-Python code multi-core  Note I had to drop text index column due to speed-hit  Data copy cost can overwhelm any benefjts so (always) profjle & time Ian Ozsvald By [ian]@ianozsvald[.com]

Being highly performant  Mistakes slow us down (PAY ATTENTION!) – Try nullable Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Lots more material & my newsletter on my blog IanOzsvald.com – Time saving docs: Ian Ozsvald By [ian]@ianozsvald[.com]

Vaex / Modin  Memory mapped & lazy computation – New string dtype (RAM efgicient)  Modin sits on Pandas, new “algebra” for dfs – Drop in replacement, easy to try See talks on my blog: Ian Ozsvald By [ian]@ianozsvald[.com]

Summary  Make it right then make it fast  Think about being performant  See blog for my classes  I’d love a postcard if you learned something new! Ian Ozsvald By [ian]@ianozsvald[.com]

Covid 19’s efgect on UK Economy? Sharp decline in corporate registration after Lockdown – then apparent surge (perhaps just backed-up paperwork?). Will the recovery “last”? All open data , you can do similar things! Ian Ozsvald By [ian]@ianozsvald[.com]

Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim - PowerPoint PPT Presentation

Making Pandas Fly (live from London) EuroPython 2020 Ian Ozsvald @IanOzsvald ianozsvald.com Introductions Interim Chief Data Scientist 19+ years experience Edition! Team coaching & public courses d n 2 Im sharing

Citizen Science with Python EuroPython 2018 Ian Ozsvald @IanOzsvald ModelInsight.io

Simple Variance Swaps Ian Martin ian.martin@stanford.edu LSE/Stanford and NBER May, 2013 Ian

The Counterintuitive Web Ian Robinson http://ian S robinson.com @ian S

Getting Things Done with REST Ian Robinson http://ian S

Dirty Tricks in the Name of Quality Ian Dees Tektronix ian.s.dees@tek.com Hi, Im Ian. Im

Swarm Transparently distributed computation in the cloud Ian Clarke ian@uprizer.com

PRESENTATION 12 March 2020 AGENDA 01 Strategic review Ian Kirk 02 Our operating context in 2019

Urban Legend Propagation Ian Dennis Miller 2018-11-08 Ian Dennis Miller Urban Legend

DNS Coffee Ian Foster $ whoami Ian Foster UCSD Graduate B.S./M.S. (2014/2015)

The Forward Premium Puzzle in a Two-Country World Ian Martin ian.martin@stanford.edu Stanford

Write gcc in C++ Ian Lance Taylor Google June 17, 2008 C++ Write gcc in C++ Ian Lance Taylor

DLLBasedPD first last Jay Bob Zoe Ian Ann Eve 182 159 818 271 314 264 DLLBasedPD

Networks of Computational Social Science Ian Dennis Miller 2018-11-22 Ian Dennis Miller

Summary of RSG/RRB Ian Bird GDB 9 th May 2012 Ian.Bird@cern.ch 1 Slides taken from C-RSG

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Write your own JVM compiler OSCON Java 2011 Ian Dees @undees Hi, Im Ian. Im here to show

CHAPTER-4 1 LOGIC AND REASONING ! Knowledge and ! Reasoning in Knowledge- Reasoning Based

Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R.

Will FEniCS fly? Kent-Andre Mardal and Hans Petter Langtangen Simula Research Laboratory

The crazy fly Timothy Bourke 1 , 2 Marc Pouzet 3 , 2 , 1 1. INRIA Paris-Rocquencourt 2. cole

You fly planes, we drive (additional) revenue How airlines can generate additional ancillary

fisheries Credit: Much of this information is derived from articles written by Robert J. Behnke

A slow afternoon chez PARKAS and a very fast fly (our grand challenge) Timothy Bourke 1 , 2 Marc

O n - the -F ly S ynchronization C hecking for I nteractive P rogramming in X calable MP T atsuya A