Supervised Learning Regression, Classifica6on Linear - PowerPoint PPT Presentation

Supervised ¡Learning ¡ ¡ Regression, ¡Classifica6on ¡ Linear ¡regression, ¡ k-‑ NN ¡classifica6on ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 11, 2014

An ¡Example: ¡Size ¡of ¡Engine ¡vs ¡Power ¡ 200 ¡ 180 ¡ 160 ¡ 140 ¡ Power ¡(bhp) ¡ 120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ Engine ¡displacement ¡(cc) ¡ § An unknown car has an engine of size 1800cc. What is likely to be the power of the engine? 2 ¡

An ¡Example: ¡Size ¡of ¡Engine ¡vs ¡Power ¡ 200 ¡ 180 ¡ 160 ¡ 140 ¡ Power ¡(bhp) ¡ 120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ Target ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ Variable ¡ Engine ¡displacement ¡(cc) ¡ § Intuitively, the two variables have a relation § Learn the relation from the given data § Predict the target variable after learning 3 ¡

Exercise: ¡on ¡a ¡simpler ¡set ¡of ¡data ¡points ¡ 12 ¡ x ¡ y ¡ 10 ¡ 1 ¡ 1 ¡ 8 ¡ 2 ¡ 3 ¡ 3 ¡ 7 ¡ 6 ¡ y ¡ 4 ¡ 10 ¡ 4 ¡ 2.5 ¡ ? ¡ 2 ¡ 0 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ x ¡ § Predict y for x = 2.5 4 ¡

Linear ¡Regression ¡ 200 ¡ 180 ¡ 160 ¡ Training ¡set ¡ 140 ¡ Power ¡(bhp) ¡ 120 ¡ 100 ¡ 80 ¡ 60 ¡ 40 ¡ 20 ¡ 0 ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ Engine ¡displacement ¡(cc) ¡ § Assume: the relation is linear § Then for a given x (=1800), predict the value of y 5 ¡

Linear ¡Regression ¡ 200 ¡ Engine ¡ Power ¡ 180 ¡ (cc) ¡ (bhp) ¡ 160 ¡ 800 ¡ 60 ¡ 1000 ¡ 90 ¡ 140 ¡ Power ¡(bhp) ¡ 1200 ¡ 80 ¡ 120 ¡ 1200 ¡ 100 ¡ 100 ¡ 1200 ¡ 75 ¡ 80 ¡ 1400 ¡ 90 ¡ 60 ¡ 1500 ¡ 120 ¡ 40 ¡ 1800 ¡ 160 ¡ 20 ¡ 2000 ¡ 140 ¡ 0 ¡ 2000 ¡ 170 ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ 2400 ¡ 180 ¡ Engine ¡displacement ¡(cc) ¡ Op-onal ¡exercise ¡ § Linear regression § Assume y = a . x + b § Try to find suitable a and b 6 ¡

Exercise: ¡using ¡Linear ¡Regression ¡ 12 ¡ x ¡ y ¡ 10 ¡ 1 ¡ 1 ¡ 8 ¡ 2 ¡ 3 ¡ 3 ¡ 7 ¡ 6 ¡ y ¡ 4 ¡ 10 ¡ 4 ¡ 2.5 ¡ ? ¡ 2 ¡ 0 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ x ¡ § Define a regression line of your choice § Predict y for x = 2.5 7 ¡

Choosing ¡the ¡parameters ¡right ¡ 200 ¡ Goal: minimizing 150 ¡ the deviation from the actual data 100 ¡ y ¡ ¡ points 50 ¡ 0 ¡ 0 ¡ 500 ¡ 1000 ¡ 1500 ¡ 2000 ¡ 2500 ¡ x ¡ § The data points: ( x 1 , y 1 ), ( x 2 , y 2 ), … , ( x m , y m ) § The regression line: f(x) = y = a . x + b § Least-square cost function : J = Σ i ( f ( x i ) – y i ) 2 § Goal: minimize J over choices of a and b 8 ¡

How ¡to ¡Minimize ¡the ¡Cost ¡Func6on? ¡ b a § Goal: minimize J for all values of a and b § Start from some a = a 0 and b = b 0 Δ ¡ § Compute: J(a 0 ,b 0 ) § Simultaneously change a and b towards the negative gradient and eventually hope to arrive an optimal § Question: Can there be more than one optimal? 9 ¡

Another ¡example: ¡ ¡ Y ¡ Training ¡set ¡ High ¡blood ¡sugar ¡ N ¡ 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ Age ¡ § Given that a person’s age is 24, predict if (s)he has high blood sugar § Discrete values of the target variable (Y / N) § Many ways of approaching this problem 10 ¡

Classifica6on ¡problem ¡ Y ¡ High ¡blood ¡sugar ¡ N ¡ 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ 24 ¡ Age ¡ § One approach: what other data points are nearest to the new point? § Other approaches? 11 ¡

Classifica6on ¡Algorithms ¡ § The k- nearest neighbor classification § Naïve Bayes classification § Decision Tree § Linear Discriminant Analysis § Logistics Regression § Support Vector Machine 12 ¡

Classifica6on ¡or ¡Regression? ¡ Given data about some cars: engine size, number of seats, petrol / diesel, has airbag or not, price § Problem 1: Given engine size of a new car, what is likely to be the price? § Problem 2: Given the engine size of a new car, is it likely that the car is run by petrol? § Problem 3: Given the engine size, is it likely that the car has airbags? 13 ¡

Classifica6on ¡

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ flat ¡ 150 ¡ Does ¡ • 100 ¡ not ¡own ¡ a ¡flat ¡ 50 ¡ 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § Given a new person’s age and income, predict – does (s)he own a flat? 15 ¡

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ flat ¡ 150 ¡ Does ¡ • 100 ¡ not ¡own ¡ a ¡flat ¡ 50 ¡ 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § Nearest neighbor approach § Find nearest neighbors among the known data points and check their labels 16 ¡

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ flat ¡ 150 ¡ Does ¡ • 100 ¡ not ¡own ¡ a ¡flat ¡ 50 ¡ 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § The 1-Nearest Neighbor (1-NN) Algorithm: – Find the closest point in the training set – Output the label of the nearest neighbor 17 ¡

The ¡ k-‑ Nearest ¡Neighbor ¡Algorithm ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ flat ¡ 150 ¡ Does ¡ • 100 ¡ not ¡own ¡ a ¡flat ¡ 50 ¡ 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § The k -Nearest Neighbor ( k -NN) Algorithm: – Find the closest k point in the training set – Majority vote among the labels of the k points 18 ¡

Distance ¡measures ¡ § How to measure distance to find closest points? § Euclidean: Distance between vectors x = ( x 1 , … , x k ) and y = ( y 1 , … , y k ) § Manhattan distance: § Generalized squared interpoint distance: S is the covariance matrix The ¡Maholanobis ¡distance ¡(1936) ¡ 19 ¡

Classifica6on ¡setup ¡ § Training data / set: set of input data points and given answers for the data points § Labels: the list of possible answers § Test data / set: inputs to the classification algorithm for finding labels – Used for evaluating the algorithm in case the answers are known (but known to the algorithm) § Classification task: Determining labels of the data points for which the label is not known or not passed to the algorithm § Features: attributes that represent the data 20 ¡

Evalua6on ¡ § Test set accuracy: the correct performance measure § Accuracy = #of correct answer / #of all answers § Need to know the true test labels – Option: use training set itself – Parameter selection (for k- NN) by accuracy on training set § Overfitting: a classifier performs too good on training set compared to new (unlabeled) test data 21 ¡

Be^er ¡valida6on ¡methods ¡ § Leave one out: – For each training data point x of training set D – Construct training set D – x, test set { x } – Train on D – x, test on x – Overall accuracy = average over all such cases – Expensive to compute § Hold out set: – Randomly choose x% (say 25-30%) of the training data, set aside as test set – Train on the rest of training data, test on the test set – Easy to compute, but tends to have higher variance 22 ¡

The ¡ k-‑ fold ¡Cross ¡Valida6on ¡Method ¡ § Randomly divide the training data into k partitions D 1 ,…, D k : possibly equal division § For each fold D i – Train a classifier with training data = D – D i – Test and validate with D i § Overall accuracy: average accuracy over all cases 23 ¡

References ¡ § Lecture videos by Prof. Andrew Ng, Stanford University Available on Coursera (Course: Machine Learning) § Data Mining Map: http://www.saedsayad.com/ 24 ¡

Supervised Learning Regression, Classifica6on Linear - PowerPoint PPT Presentation

Supervised Learning Regression, Classifica6on Linear regression, k- NN classifica6on Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 11, 2014 An Example:

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

The beginnings of the Second Automobile Revolution Firms strategies and public policies Michel

Lecture 17 End of wind, transporta@on & engines Many

Understanding fossil fuel consumption growth: why history matters Simon Pirani Senior Visiting

Business Process Modelling Languages, Goals and Variabilities Birgit Korherr Womens

Which God do you know better? The compassionate, gracious God. or The all - powerful,

KPI 1: Affordability (November 2018) Mode 2017 2018 Variance RAG 180

Principles of Feature Modeling Damir Ne i Jacob Krger tefan Stnciulescu Thorsten

Inf1-OP Classes and Objects - Part I Volker Seeker, adapting earlier version by Perdita Stevens

Supervised Learning Regression, Classifica6on Linear - PowerPoint PPT Presentation

Supervised Learning Regression, Classifica6on Linear regression, k- NN classifica6on Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 11, 2014 An Example:

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

The beginnings of the Second Automobile Revolution Firms strategies and public policies Michel

Lecture 17 End of wind, transporta@on &amp; engines Many

Understanding fossil fuel consumption growth: why history matters Simon Pirani Senior Visiting

Business Process Modelling Languages, Goals and Variabilities Birgit Korherr Womens

Which God do you know better? The compassionate, gracious God. or The all - powerful,

KPI 1: Affordability (November 2018) Mode 2017 2018 Variance RAG 180

Principles of Feature Modeling Damir Ne i Jacob Krger tefan Stnciulescu Thorsten

Inf1-OP Classes and Objects - Part I Volker Seeker, adapting earlier version by Perdita Stevens

Lecture 17 End of wind, transporta@on & engines Many