Whats new and awesome in pandas pandas? In [13]: foo Out[13]: - PowerPoint PPT Presentation

What’s new and awesome in pandas

pandas? In [13]: foo Out[13]: methyl1 age edu something indic 0 38.36 30to39 geCollege 1 False 1 37.85 lt30 geCollege 1 False 2 38.57 30to39 geCollege 1 False 3 39.75 30to39 geCollege 1 True 4 43.83 30to39 geCollege 1 True 5 39.08 30to39 ltHS 1 True Size-mutable “labeled arrays” that can handle heterogeneous data

Kinda like a structured array?? • Automatic data alignment with lots of reshaping and indexing methods • Implicit and explicit handling of missing data • Easy time series functionality – Far less fuss than scikits.timeseries • Lots of in-memory SQL-like operations (group by, join, etc.)

pandas? • Extremely good for financial data – StackOverflow: “this is a beast of a financial analysis tool” • One of the better relational data munging tools in any language? • But also has maybe 60+% of what R users expect when they come to Python

1. Heavily redesigned internals • Merged old DataFrame and DataMatrix into a single DataFrame: retain optimal performance where possible • Internal BlockManager class manages homogeneous ndarrays for optimal performance and reshaping

1. Heavily redesigned internals • Better handling of missing data for non-floating point dtypes • Soon: DataFrame variant with N-dim “hyperslabs”

2. Fancier indexing Mix boolean / integer / label / slice-based indexing df.ix[0] df.ix[date1:date2] df.ix[:5, ‘A’:’F’] Setting works too df.ix[df[‘A’] > 0, [‘B’, ‘C’, ‘D’]] = nan

3. More robust IO data_frame = read_csv(‘mydata.csv’) data_frame2 = read_table(‘mydata.txt’, sep=‘\t’, skiprows=[1,2], na_values=[‘#N/A NA’]) store = HDFStore(‘pytables.h5’) store[‘a’] = data_frame store[‘b’] = data_frame2

4. Better pivoting / reshaping foo bar A B C 0 one a -0.0524 1.664 1.171 1 one a 0.2514 0.8306 -1.396 2 one b 0.1256 0.3897 0.5227 3 one b -0.9301 0.6513 -0.2313 4 one c 2.037 1.938 -0.3454 5 two a 0.2073 0.7857 0.9051 6 two a -1.032 -0.8615 1.028 7 two b -0.7319 -1.846 0.9294 8 two b 0.1004 -1.19 0.6043 9 two c -1.008 -0.3339 0.09522

4. Better pivoting / reshaping In [29]: pivoted = df.pivot('bar', 'foo') In [30]: pivoted['B'] Out[30]: one two a 1.664 0.7857 b 0.8306 -0.8615 c 0.3897 -1.846 d 0.6513 -1.19 e 1.938 -0.3339

4. Better pivoting / reshaping In [31]: pivoted.major_xs('a') Out[31]: A B C one -0.0524 1.664 1.171 two 0.2073 0.7857 0.9051 In [32]: pivoted.minor_xs('one') Out[32]: A B C a -0.0524 1.664 1.171 b 0.2514 0.8306 -1.396 c 0.1256 0.3897 0.5227 d -0.9301 0.6513 -0.2313 e 2.037 1.938 -0.3454

4. Better pivoting / reshaping In [30]: pivoted['B'] Out[30]: one two a 1.664 0.7857 b 0.8306 -0.8615 c 0.3897 -1.846 d 0.6513 -1.19 e 1.938 -0.3339

4. Some other things • “Sparse” (mostly NA) versions of data structures • Time zone support in DateRange • Generic moving window function rolling_apply

Near future • More powerful Group By • Flexible, fast frequency (time series) conversions • More integration with statsmodels

Thanks! • Hack: github.com/wesm/pandas • Twitter: @wesmckinn • Blog: blog.wesmckinney.com

Whats new and awesome in pandas pandas? In [13]: foo Out[13]: - PowerPoint PPT Presentation

Whats new and awesome in pandas pandas? In [13]: foo Out[13]: methyl1 age edu something indic 0 38.36 30to39 geCollege 1 False 1 37.85 lt30 geCollege 1 False 2 38.57 30to39

Pandas Data Manipulation in Python 1 / 31 Pandas Built on NumPy Adds data structures and

Reading date and time data in Pandas W ORK IN G W ITH DATES AN D TIMES IN P YTH ON Max Shron

Review of pandas DataFrames PAN DAS F OUN DATION S Dhavide Aruliah Director of Training,

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Plotting directl y u sing pandas P YTH ON FOR R U SE R S Daniel Chen Instr u ctor Plotting in

Modern pandas Herv Mignot EQUANCY 1 Building Pipelines with Python Data Size PySpark x100

Python Data Processing with Pandas CSE 5542 Introduc:on to Data Visualiza:on Pandas A very

All You Need is Pandas All You Need is Pandas Unexpected Success Stories Dimiter Naydenov

Intro to pandas DataFrame iteration W RITIN G EF F ICIEN T P YTH ON CODE Logan Thomas Senior

Tweak twig with awesome Vue.js by Tejomay Saha Tweak twig with awesome Vue.js by Tejomay

Week 6 Video 5 Visualization Other Awesome EDM Visualizations Other Awesome EDM Visualizations

Python plotting A modern approach with Pandas and Seaborn Andreas Bjerre-Nielsen Recap What

Data Analysis with Python Pandas, Jupyter, and Friends Andreas Herten, 4 May 2017 The data

Data structuring The Pandas way Andreas Bjerre-Nielsen Recap What have we learned about

STATS 701 Data Analysis using Python Lecture 14: Advanced pandas Recap Previous lecture: basics

Visual exploratory data analysis pandas Foundations The iris data set Famous data set in pa

Scientific Programming Lecture A07 Pandas Andrea Passerini Universit degli Studi di Trento

Boolean indexing: > x >= 30

Tutorial: Market Simulator Outline 1. Install Python and some libraries 2. Download Template File

Pandas Under The Hood Peeking behind the scenes of a high performance data analysis library

The Python Ecosystem for Data Science: A Guided Tour PyData Warsaw 2017 | at the Copernicus

Weld: Accelerating Data Science by 100x Shoumik Palkar , James Thomas, Deepak Narayanan ,

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

Lecture 10: Performance Tools Abhinav Bhatele, Department of Computer Science Announcements

Whats new and awesome in pandas pandas? In [13]: foo Out[13]: - PowerPoint PPT Presentation

Whats new and awesome in pandas pandas? In [13]: foo Out[13]: methyl1 age edu something indic 0 38.36 30to39 geCollege 1 False 1 37.85 lt30 geCollege 1 False 2 38.57 30to39

Pandas Data Manipulation in Python 1 / 31 Pandas Built on NumPy Adds data structures and

Reading date and time data in Pandas W ORK IN G W ITH DATES AN D TIMES IN P YTH ON Max Shron

Review of pandas DataFrames PAN DAS F OUN DATION S Dhavide Aruliah Director of Training,

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Plotting directl y u sing pandas P YTH ON FOR R U SE R S Daniel Chen Instr u ctor Plotting in

Modern pandas Herv Mignot EQUANCY 1 Building Pipelines with Python Data Size PySpark x100

Python Data Processing with Pandas CSE 5542 Introduc:on to Data Visualiza:on Pandas A very

All You Need is Pandas All You Need is Pandas Unexpected Success Stories Dimiter Naydenov

Intro to pandas DataFrame iteration W RITIN G EF F ICIEN T P YTH ON CODE Logan Thomas Senior

Tweak twig with awesome Vue.js by Tejomay Saha Tweak twig with awesome Vue.js by Tejomay

Week 6 Video 5 Visualization Other Awesome EDM Visualizations Other Awesome EDM Visualizations

Python plotting A modern approach with Pandas and Seaborn Andreas Bjerre-Nielsen Recap What

Data Analysis with Python Pandas, Jupyter, and Friends Andreas Herten, 4 May 2017 The data

Data structuring The Pandas way Andreas Bjerre-Nielsen Recap What have we learned about

STATS 701 Data Analysis using Python Lecture 14: Advanced pandas Recap Previous lecture: basics

Visual exploratory data analysis pandas Foundations The iris data set Famous data set in pa

Scientific Programming Lecture A07 Pandas Andrea Passerini Universit degli Studi di Trento

Boolean indexing: &gt; x &gt;= 30

Tutorial: Market Simulator Outline 1. Install Python and some libraries 2. Download Template File

Pandas Under The Hood Peeking behind the scenes of a high performance data analysis library

The Python Ecosystem for Data Science: A Guided Tour PyData Warsaw 2017 | at the Copernicus

Weld: Accelerating Data Science by 100x Shoumik Palkar , James Thomas, Deepak Narayanan ,

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

Lecture 10: Performance Tools Abhinav Bhatele, Department of Computer Science Announcements

Boolean indexing: > x >= 30