Classification March 19, 2020 Data Science CSCI 1951A Brown - PowerPoint PPT Presentation

Classification March 19, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1

Today • Generative vs. Discriminative Models • KNN, Naive Bayes, Logistic Regression • SciKit Learn Demo 2

Supervised vs. Unsupervised Learning • Supervised: Explicit data labels • Sentiment analysis—review text -> star ratings • Image tagging—image -> caption • Unsupervised: No explicit labels • Clustering—find groups similar customers • Dimensionality Reduction—find features that differentiate individuals 3

Classification One Goal: P(Y|X)

Classification One Goal: P(Y|X) Label

Classification Features One Goal: P(Y|X) Label

Classification One Goal: P(Y|X) P(email is spam | words in the message) P(genre of song|tempo, harmony, lyrics…) P(article clicked | title, font, photo…)

harmonic complexity K Means tempo 8

harmonic complexity K Means X tempo 9 X

K Nearest Neighbors harmonic complexity Blue or Red? tempo 10

K Nearest Neighbors harmonic complexity K = 1 tempo 11

K Nearest Neighbors harmonic complexity tempo 13

K Nearest Neighbors • Arguably the simplest ML algorithm • “Non-Parametric” — no assumptions about the form of the classification model • All the work is done at classification time • Works with tiny amounts of training data (single example per class) • The best classification model ever ???

Supervised Classification https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Generative Models Discriminative Models

Generative Models Discriminative Models estimate P(X, Y) first

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to observations, generate new observations

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible Often more parameters, but more flexible

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible Often more parameters, Often fewer parameters, better but more flexible performance on small data

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible Often more parameters, Often fewer parameters, better but more flexible performance on small data Naive Bayes, Bayes Nets, VAEs, GANs

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible Often more parameters, Often fewer parameters, better but more flexible performance on small data Naive Bayes, Bayes Logistic Regression, Nets, VAEs, GANs SVMs, Perceptrons

Generative Models Discriminative Models estimate P(X, Y) first estimate P(Y | X) directly /no explicit probability model Can assign probability to Only supports observations, generate classification, new observations less flexible Often more parameters, Often fewer parameters, better but more flexible performance on small data Naive Bayes, Bayes Logistic Regression, Nets, VAEs, GANs SVMs, Perceptrons KNN

Supervised Classification

Supervised Classification Good if not dramatic fizz. *** Rubbery - rather oxidised. * Gamy, succulent tannins. Lovely. **** Provence herbs, creamy, lovely. **** Lovely mushroomy nose and good length. ***** Quite raw finish. A bit rubbery. **

Supervised Classification Lovely mushroomy nose and good length. 1 Gamy, succulent tannins. Lovely. 1 Provence herbs, creamy, lovely. 1 Good if not dramatic fizz. 0 Quite raw finish. A bit rubbery. 0 Rubbery - rather oxidised. 0

Lovely mushroomy nose and good length. 1 Gamy, succulent tannins. Lovely. 1 Supervised Classification Provence herbs, creamy, lovely. 1 Quite raw finish. A bit rubbery. 0 Good if not dramatic fizz. 0 Rubbery - rather oxidised. 0 Label lovely good raw rubbery rather mushroomy gamy … 1 1 1 0 0 0 0 0 … 1 1 0 0 0 0 0 1 … 1 1 0 0 0 0 0 0 … 0 0 0 1 1 0 0 0 …

Lovely mushroomy nose and good length. 1 Gamy, succulent tannins. Lovely. 1 Supervised Classification Provence herbs, creamy, lovely. 1 Quite raw finish. A bit rubbery. 0 Good if not dramatic fizz. 0 Rubbery - rather oxidised. 0 y Label lovely good raw rubbery rather mushroomy gamy … 1 1 1 0 0 0 0 0 … 1 1 0 0 0 0 0 1 … 1 1 0 0 0 0 0 0 … 0 0 0 1 1 0 0 0 …

Lovely mushroomy nose and good length. 1 Gamy, succulent tannins. Lovely. 1 Supervised Classification Provence herbs, creamy, lovely. 1 Quite raw finish. A bit rubbery. 0 Good if not dramatic fizz. 0 Rubbery - rather oxidised. 0 y X Label lovely good raw rubbery rather mushroomy gamy … 1 1 1 0 0 0 0 0 … 1 1 0 0 0 0 0 1 … 1 1 0 0 0 0 0 0 … 0 0 0 1 1 0 0 0 …

Lovely mushroomy nose and good length. 1 Gamy, succulent tannins. Lovely. 1 Supervised Classification Provence herbs, creamy, lovely. 1 Quite raw finish. A bit rubbery. 0 Good if not dramatic fizz. 0 Rubbery - rather oxidised. 0 y X Label lovely good raw rubbery rather mushroomy gamy … 1 1 1 0 0 0 0 0 … 1 1 0 0 0 0 0 1 … 1 1 0 0 0 0 0 0 … 0 0 0 1 1 0 0 0 … ??? 0 1 1 0 1 0 1 …

Bayes Rule

Bayes Rule P(Y|X) = P(X|Y)P(Y) P(X)

Bayes Rule Label lovely good raw rubbery rather mushroomy gamy … 1 1 1 0 0 0 0 0 … 1 1 0 0 0 0 0 1 … 1 1 0 0 0 0 0 0 … 0 0 0 1 1 0 0 0 …

Classification March 19, 2020 Data Science CSCI 1951A Brown - PowerPoint PPT Presentation

Classification March 19, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Today Generative vs. Discriminative Models KNN, Naive Bayes, Logistic Regression SciKit

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Classification Classification TNM classification Survival time Survival time Tumour size,

ADEQ Lakes Classification ADEQ Lakes Classification ADEQ Lakes Classification Project Project

OVERVIEW U.S. National Vegetation Classification A Classification Partnership Don Faber-

Welcome to the Board of Visitors Virtual Meeting 9 June 2020 CLASSIFICATION CLASSIFICATION

Need for Classification Classification required To isolate traffic of interest

Bag-of-features models for category classification for category classification Cordelia Schmid

Library of Congress Classification: Module 3.1 1 Library of Congress Classification: Module 3.1

Arlington Democrats July 2020 General Meeting July 1, 2020 7:00 PM Welcome to Resisting While

Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and

BETTER-BEHAVED, BETTER-PERFORMING MULTIMEDIA NETWORKING Jae Chung and Mark Claypool Computer

Complexity driven collapse of economic equilibria Giacomo Livan joint work with Marco Bardoscia

Question-Answering: Overview Ling573 Systems & Applications April 3, 2014 Roadmap

ggeqt.FI IItaeA Imt Ker h add Ae prg A.TT B B B pmjAkes EAe3 Recollement of abelian cat's

Automatic Cyclic Termination Proofs for Recursive Procedures in Separation Logic Reuben Rowe and

SpecNet: Spectrum Sensing Sans Frontires Anand Iyer * , Krishna Chintalapudi * , Vishnu Navda *

Classification March 19, 2020 Data Science CSCI 1951A Brown - PowerPoint PPT Presentation

Classification March 19, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Today Generative vs. Discriminative Models KNN, Naive Bayes, Logistic Regression SciKit

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Classification Classification TNM classification Survival time Survival time Tumour size,

ADEQ Lakes Classification ADEQ Lakes Classification ADEQ Lakes Classification Project Project

OVERVIEW U.S. National Vegetation Classification A Classification Partnership Don Faber-

Welcome to the Board of Visitors Virtual Meeting 9 June 2020 CLASSIFICATION CLASSIFICATION

Need for Classification Classification required To isolate traffic of interest

Bag-of-features models for category classification for category classification Cordelia Schmid

Library of Congress Classification: Module 3.1 1 Library of Congress Classification: Module 3.1

Arlington Democrats July 2020 General Meeting July 1, 2020 7:00 PM Welcome to Resisting While

Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and

BETTER-BEHAVED, BETTER-PERFORMING MULTIMEDIA NETWORKING Jae Chung and Mark Claypool Computer

Complexity driven collapse of economic equilibria Giacomo Livan joint work with Marco Bardoscia

Question-Answering: Overview Ling573 Systems &amp; Applications April 3, 2014 Roadmap

ggeqt.FI IItaeA Imt Ker h add Ae prg A.TT B B B pmjAkes EAe3 Recollement of abelian cat's

Automatic Cyclic Termination Proofs for Recursive Procedures in Separation Logic Reuben Rowe and

SpecNet: Spectrum Sensing Sans Frontires Anand Iyer * , Krishna Chintalapudi * , Vishnu Navda *

Question-Answering: Overview Ling573 Systems & Applications April 3, 2014 Roadmap