Python Data Processing with Pandas CSE 5542 Introduc:on to Data Visualiza:on
Pandas • A very powerful package of Python for manipula:ng tables • Built on top of numpy, so is efficient • Save you a lot of effort from wri:ng lower python code for manipula:ng, extrac:ng, and deriving tables related informa:on • Easy visualiza:on with Matplotlib • Main data structures – Series and DataFrame
• First thing first • Series: an indexed 1D array
• Explicit index • Access data
• Can work as a dic:onary • Access and slice data
DataFrame Object • Generalized two dimensional array with flexible row and column indices
DataFrame Object • Generalized two dimensional array with flexible row and column indices
DataFrame Object • From Pandas Series
DataFrame Object • From Pandas Series
DataFrame Object • Another example
Viewing Data • View the first or last N rows
Viewing Data • Display the index, columns, and data
Viewing Data • Quick sta:s:cs (for columns A B C D in this case)
Viewing Data • Sor:ng: sort by the index (i.e., reorder columns or rows), not by the data in the table column
Viewing Data • Sor:ng: sort by the data values
Selec:ng Data • Selec:ng using a label
Selec:ng Data • Mul:-axis, by label
Selec:ng Data • Mul:-axis, by label Slicing: last included
Selec:ng Data • Select by posi:on
Selec:ng Data • Boolean indexing
Selec:ng Data • Boolean indexing
SeZng Data • SeZng a new column aligned by indexes
SeZng Data
Opera:ons • Descrip:ve sta:s:cs – Across axis 0 (rows), i.e., column mean – Across axis 1 (column), i.e., row mean
Opera:ons • Apply • Histogram
Merge Tables • Join
Merge Tables • Append
Grouping
File I/O • CSV
File I/O • Excel
Recommend
More recommend