Applied Machine Learning Nearest Neighbours Siamak Ravanbakhsh - PowerPoint PPT Presentation

Applied Machine Learning Nearest Neighbours Siamak Ravanbakhsh COMP 551 (Fall 2020)

Admin Arnab is the head-TA: contact: arnab.mondal@mail.mcgill.ca send all your questions to Arnab if the question is relevant to other students you can post it in the forum he will decide if it is needed to bring someone else in the loop for team formation issues we will put students outside EST, who are in close time-zones in contact team TAs: Samin: samin.arnob@mail.mcgill.ca Tianyu: tianyu.li@mail.mcgill.ca

Admin First tutorial (Python-Numpy) : Given by Amy (amy.x.zhang@mail.mcgill.ca) This Thursday 4:30-6 pm It will be recorded and the material will be posted also, TA office hours will be posted this week about class capacity COMP 551 | Fall 2020

Objectives variations of k-nearest neighbors for classification regression computational complexity some pros and cons of K-NN what is a hyper-parameter?

Nearest neighbour classifier training: do nothing (a lazy learner , also a non-parametric model) test: predict the lable by finding the most similar example in training set try similarity-based classification yourself: is this a kind of is this calligraphy from (a) stork (a) east Asia (b) pigeon (b) Africa (c) penguin (c) middle east Accretropin : is it example of nearest neighbor regression (a) an east European actor pricing based on similar items (b) drug (e.g., used in the housing market) (c) gum brand

Nearest neighbour classifier training: do nothing (a lazy learner ) test: predict the lable by finding the most similar example in training set need a measure of distance (e.g., a metric) examples for real-valued feature-vectors Euclidean distance ′ D ′ 2 ( x , x ) = ∑ d =1 ( x − x ) D Euclidean d d Manhattan distance ′ D ′ ( x , x ) = ∣ x − x ∣ ∑ d =1 D Manhattan d d 1 ( ∑ d =1 p ) ′ D ′ p Minkowski distance ( x , x ) = ( x − x ) D Minkowski d d Cosine similarity ⊤ ′ ′ ( x , x ) = x x D Cosine ∣∣ x ∣∣∣∣ x ∣∣ ′ for discrete feature-vectors Hamming distance ′ D ′ I ( x = ( x , x ) = ∑ d =1 d  x ) D Hamming d ... and there are metrics for strings, distributions etc. COMP 551 | Fall 2020

Iris dataset N = 150 instances of flowers one of the most famous datasets in statistics D=4 features C=3 classes for better visualization, we use only two features n ∈ {1, … , N } input ( n ) R 2 ∈ x indexes the training instance ( n ) ∈ {1, 2, 3} label y sometime we drop (n) using Euclidean distance nearest neighbor classifier gets 68% accuracy in classifying the test instances

Decision boundary a classifier defines a decision boundary in the input space all points in this region will have the same class the Voronoi diagram visualizes the decision boundary of nearest neighbor classifier each color shows all points closer to the corresponding training instance than to any other instance

Higher dimensions: digits dataset size of the input image in pixels ( n ) {0, … , 255} 28×28 ∈ input x ( n ) ∈ {0, … , 9} label y indexes the training instance n ∈ {1, … , N } sometime we drop (n) vectorization: x → vec( x ) ∈ R 784 input dimension D pretending intensities are real numbers image:https://medium.com/@rajatjain0807/machine-learning-6ecde3bfd2f4

K - Nearest Neighbor (K-NN) classifier training: do nothing test: find the nearest image in the training set we are using Euclidean distance in a 784-dimensional space to find the closest neighbour can we make the predictions more robust? closest instance new test instance consider K -nearest neighbors and label by the majority we can even estimate the probability of each class 1 ∑ x I ( y ( k ) p ( y = c ∣ x ) = = c ) new new ( k ) ∈KNN( x ) new K closest instances 6 p ( y = 6∣ ) = 9 new test instance

Choice of K K is a hyper-parameter of our model in contrast to parameters, the hyper-parameters are not learned during the usual training procedure K = 1 76% accuracy K = 5 84% accuracy K = 15 78% accuracy

Computational complexity the computational complexity for a single test query: O ( ND + NK ) for each point in the training set calculate the distance in O ( D ) for a total of O ( ND ) find the K points with smallest of distances in O ( NK ) bonus in practice efficient implementations using KD-tree (and ball-tree) exist the partition the space based on a tree structure for a query point only search the relevant part of the space

Scaling and importance of features scaling of features affects distances and nearest neighbours feature sepal width is scaled x100 example closeness in this dimension becomes more important in finding the nearest neighbor

Scaling and importance of features we want important features to maximally affect the classification: they should have larger scale noisy and irrelevant features should have a small scale K-NN is not adaptive to feature scaling and it is sensitive to noisy features example add a feature that is random noise to previous example plot the effect of the scale of noise feature on accuracy COMP 551 | Fall 2020

K-NN regression so far our task was classification use majority vote of neighbors for prediction at test time the change for regression is minimal use the mean (or median) of K nearest neighbors' targets example D=1, K=5 example from scikit-learn.org

Some variations in weighted K-NN the neighbors are weighted inversely proportional to their distance for classification the votes are weighted for regression calculate the weighted average in fixed radius nearest neighbors all neighbors in a fixed radius are considered in dense neighbourhoods we get more neighbors example from scikit-learn.org COMP 551 | Fall 2020

Summary K-NN performs classification/regression by finding similar instances in training set need a notion of distance how many neighbors to consider (fixed K, or fixed radius) how to weight the neighbors K-NN is a non-parametric method and a lazy learner non-parameteric: our model has no parameters (in fact the training data points are model parameters) Lazy, because we don't do anything during the training test-time complexity grows with the size of the data K-NN is sensitive to feature scaling and noise

Applied Machine Learning Nearest Neighbours Siamak Ravanbakhsh - PowerPoint PPT Presentation

Applied Machine Learning Nearest Neighbours Siamak Ravanbakhsh COMP 551 (Fall 2020) Admin Arnab is the head-TA: contact: arnab.mondal@mail.mcgill.ca send all your questions to Arnab if the question is relevant to other students you can post it

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Energy-aware Software Development for Massive-Scale Systems Torsten Hoefler With input from Marc

Severin, Romania at the 45th parallel. The dramatic moss-covered falls are situated in the

E a r n i n g s C o n f e r e n c e C a l l February 28th, 2019 Forward-Looking Statement The

RSC Workshop on the Integration of WAPA, Basin, Heartland April 4, 2014 AGENDA I. Impact of

Seven (+-2) Sins of Concurrency Chen Shapira In which I will show classical concurrency problems

29.09.2007 No Shortage of Public Fears RFID and Privacy The risk [RFID] poses to humanity

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

Doubly Efficient Interactive Proofs Ron Rothblum Outsourcing Computation Weak client outsources