ARTIFICIAL INTELLIGENCE AND PYTHON DAY 1 STANLEY LIANG, LASSONDE SCHOOL OF ENGINEERING, YORK UNIVERSITY
WHAT IS PYTHON • An interpreted high-level programming language for general-purpose programming. • Python features a dynamic type system and automatic memory management. • Python supports multiple programming paradigms, including object-oriented, imperative, functional and procedural. • Python as has a large and comprehensive standard library and multiple packages for different purposes. • In this course, we will use the Anaconda distribution with various Python tools to implement AI and machine learning tasks
INSTALL SOFTWARE • Download and install Python 3.6 from https://www.python.org/downloads/ • Visit https://anaconda.org/anaconda/python • For PC with Windows 10: open a command line and type: conda install -c anaconda python • For Mac, download from: https://www.anaconda.com/download/#macos • Open Anaconda Navigator and launch Jupyter • Jupyter is an interactive IDE (integrated development environment) for Python • Other choice: PyCharm, Visual Studio, Spyder, etc.
BASIC TYPE AND ASSIGNMENT • String - unlike C, Python has no char • Number - unlike C, Python has no int or float / double • Boolean -True / False, capitalize the first letter • Multiple Assignment • The null value – None, not null
FLOW CONTROL • Be careful of the indentation • Branching: If-Then-Else • Iteration: For-Loop, while-Loop, No native do-while-loop
DATA STRUCTURE • Tuple - read-only collections of items • List - use the square bracket notation and can be index using array notation • Dictionary - are mappings of names to values, like key-value pairs. Note the use of the curly • Summary • Tuple uses ( ), List uses [ ], Dictionary uses { } with ‘ ’ for the keys • To subset, always use [ ]
FUNCTION IN PYTHON • Function in Python is initiated by the keyword “def”, i.e. define • Do not use “func” or “function” as the keyword, but remember use the parenthesis “( )” as the sign of a “function call” • The biggest tricky thing with Python is the whitespace. • Ensure that you have an empty new line after indented code. • A function can have one or more arguments, or have no arguments, but don’t need to return a type
THE NUMPY • NumPy provides the foundation data structures and operations for SciPy • These are arrays (ndarrays) that are efficient to define and manipulate • Before use, you need to import the numpy package • If use Anaconda, the numpy is installed by default • If Python cannot find it, use pip, or conda to install from commandline • python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
DATA VISUALIZATION • In Python, we can visualize the data by the Matplotlib package • Matplotlib can be used for creating plots and charts • The general procedure to use Matplotlib • import matplotlib.pyplot as plt • Call a plotting function such as plt.plot( ) or plt.scatter, etc. • Call the plot property configuration functions such as label, lim, etc. • Call title, text, etc. to add notations • Visualize the configured plot by show( )
PANDAS AND DATAFRAME • Pandas provides data structures and functionality to quickly manipulate and analyze data • The two important element in Pandas • Series - a one dimensional array of data where the rows are labeled using a time axis • Subset a Series by index • DataFrame - a multi-dimensional array where the rows and the columns can be labeled • Subset a DataFrame by columns • Subset a DataFrame by rows
LOAD DATA FROM A CSV FILE • Before starting machine learning, you should load your data into Python • The most common format for machine learning data is CSV files • Three ways to load a CSV into Python • Load CSV Files with the Python Standard Library • Load CSV Files with NumPy • Load CSV Files with Pandas (recommended) • CSV from two source • Local machine – always use ‘/’ to define the path • From a URL – using urllib.request.urlopen or pandas.read_csv
UNDERSTAND YOUR DATA • You must understand your data in order to get the best results • Take a peek for a first impression • Review the dimension of the dataset • Review the data type of the attributes (columns) • Summarize the distribution of instances across classes in your dataset • Summarize your data using descriptive statistics • Understand the relationships in your data using correlations • Review the skew of the distributions of each attribute
VISUALIZE YOUR DATA • You must understand your data in order to get the best results from machine learning algorithms. • The intuitive way to learn more about your data is to visualize them. • Plots for univariate (one variable) Histogram • Density Plot • Box & Whisker Plot • • Plot for Multivariate (more than one variable) Correlation Matrix Plot • Scatter Plot Matrix •
PREPARE YOUR DATA FOR MACHINE LEARNING • Many machine learning algorithms make assumptions about your data • Different algorithms requires different data transforms – data preprocessing • Prepare the data to best expose the structure of the problem Rescale data • Standardize data • Normalize data • Binarize data • • The scikit-learn library of Python provides two standard methods for transforming data Fit and Multiple Transform • Combined Fit-And-Transform •
Recommend
More recommend