Introduction in ML with scikit- learn Professor Patrick McDaniel - PowerPoint PPT Presentation

Introduction in ML with scikit- learn Professor Patrick McDaniel Jonathan Price Fall 2015

Features • Attributes in a data set • “Individual measurable property of phenomenon being observed” • Choosing/discovering features is a crucial part of ML • Ex: ‣ Character Recognition: histograms of pixels ‣ Speech Recognition: Sound length, power, frequency ‣ Malware Detection: Function use count, byte counts Page

Supervised Learning • Inferring a function from labeled training data • The features are selected by the developer • As such, it requires the developer to know something about the dataset to infer good features • Based on pairs of input objects and output values • Ex: ‣ Regression – Predict values ‣ Classification – Predict groupings Page

Unsupervised Learning • Find hidden structure or patterns in unlabled data • Requires no prior knowledge of the nature of data • Not limited by biases inherent in feature selection • Ex: ‣ K-means ‣ Clustering ‣ Neural networks Page

Scikit-learn • The easy way to do data mining and data analysis • Its all Python scripts (yay) • Built on NumPy, SciPy, and matplotlib • Okay, lets get it: ‣ pip install numpy scipy scikit-learn Page

Lets do one • Classification of digits problem • Classify images of drawn numbers Page

Before We Start • What can we use about the image of a character to solve this problem? Page

Dataset • Dataset object in scikit-learn is a dictionary-like object that holds all data (and some metadata). • Actual data is stored as a N_sampes, N_features array • Lets get the digit dataset: >>> from sklearn import datasets >>> digits = datasets.load_digits() Page

Dataset Page

Dataset • “digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by the other 14 are used for writer independent testing” • 500 x 500 pixel characters, compressed to form this (and then a feature vector of length=64): Page

Lets Do Some Estimating • We’re going to use support vector classification (SVC). We’ll explain later. • This code sets up the classifier clf: >>> from sklearn import svm >>> clf = svm.SVC(gamma=0.001, C=100.) • We will also treat this as a black box and come back to the gamma/C values later Page

Fit And Predict • To fit the classifier: >>> clf.fit(digits.data[:-1], digits.target[:-1]) • Now, we predict! >>> clf.predict(digits.data[-1]) array([8]) • Which is apparently this from before: Page

Its (Sort of) That Easy! • We glossed over a couple details, but this shows how easy scikit learn makes the actual implementation • Lets talk about some of the concepts we skipped over earlier Page

SVC’s • We are NOT going into implementation details. • Used for classification, regression, and detecting outliers • Advantages: ‣ Works in high-dimensional spaces ‣ Memory efficient ‣ Versatile • Disadvantages ‣ Bad when # of features > # of samples ‣ Don’t directly provide probability Page

SVC: Graphically Page

Next Week • Next, we will go over a security usage of data analysis: a malware classification Kaggle challenge from Microsoft • See the course site for supplemental readings and setup instructions Page

Introduction in ML with scikit- learn Professor Patrick McDaniel - PowerPoint PPT Presentation

Introduction in ML with scikit- learn Professor Patrick McDaniel Jonathan Price Fall 2015 Features Attributes in a data set Individual measurable property of phenomenon being observed Choosing/discovering features is a

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

COMP 204 Intro to machine learning with scikit-learn (part three) Mathieu Blanchette 1 / 14

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Topic Modelling with Scikit-learn Derek Greene University College Dublin PyData Dublin

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Luca Erculiani

Scikit-learn's Transformers - v0.20 and beyond - Tom Dupr la Tour - PyParis 14/11/2018 1 / 30

scikit-learn Case Study Professor Patrick McDaniel Jonathan Price Fall 2015 More Advanced Usage

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

COMP 204 Intro to machine learning with scikit-learn (part two) Mathieu Blanchette, based on

Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector

Using JavaScript with Twine Cool effects to polish your interactive story! The Code Liberation

Nonlinear Control Lecture # 26 State Feedback Stabilization Nonlinear Control Lecture # 26 State

15-388/688 - Practical Data Science: Linear classification J. Zico Kolter Carnegie Mellon

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 5: Neural Networks and Deep Learning November

COMP 364: Computer Tools for Life Sciences Intro to machine learning with scikit-learn (part

Simulating the Sky, Lecture2 Creating, Testing, and Using Simulations of the Galaxy Population

Backstepping From simple designs to take-off Ola Hrkegrd Control & Communication

Sambuz

Useful Links

Newsletter

Mail Us

Introduction in ML with scikit- learn Professor Patrick McDaniel - PowerPoint PPT Presentation

Introduction in ML with scikit- learn Professor Patrick McDaniel Jonathan Price Fall 2015 Features Attributes in a data set Individual measurable property of phenomenon being observed Choosing/discovering features is a

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

COMP 204 Intro to machine learning with scikit-learn (part three) Mathieu Blanchette 1 / 14

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Topic Modelling with Scikit-learn Derek Greene University College Dublin PyData Dublin

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Luca Erculiani

Scikit-learn's Transformers - v0.20 and beyond - Tom Dupr la Tour - PyParis 14/11/2018 1 / 30

scikit-learn Case Study Professor Patrick McDaniel Jonathan Price Fall 2015 More Advanced Usage

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

COMP 204 Intro to machine learning with scikit-learn (part two) Mathieu Blanchette, based on

Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector

Using JavaScript with Twine Cool effects to polish your interactive story! The Code Liberation

Nonlinear Control Lecture # 26 State Feedback Stabilization Nonlinear Control Lecture # 26 State

15-388/688 - Practical Data Science: Linear classification J. Zico Kolter Carnegie Mellon

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 5: Neural Networks and Deep Learning November

COMP 364: Computer Tools for Life Sciences Intro to machine learning with scikit-learn (part

Simulating the Sky, Lecture2 Creating, Testing, and Using Simulations of the Galaxy Population

Backstepping From simple designs to take-off Ola Hrkegrd Control &amp; Communication

Sambuz

Useful Links

Newsletter

Mail Us

Backstepping From simple designs to take-off Ola Hrkegrd Control & Communication