K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative • Check out review materials • Probability • Linear algebra • Python and NumPy • Start your HW 0 • On your Local machine: Install Anaconda, Jupiter notebook • On the cloud: https://colab.research.google.com • Sign up Piazza discussion forum

Enrollment • Maximum allowable capacity reached. Students Classroom

Machine learning reading&study group • Reading Group Tuesday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Research paper reading: machine learning, computer vision • Study Group Thursday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Video lecture: machine learning All are welcome. More info: https://github.com/vt-vl-lab/reading_group

Recap: Machine learning algorithms Supervised Unsupervised Learning Learning Discrete Classification Clustering Dimensionality Continuous Regression reduction

Today’s plan • Supervised learning • Setup • Basic concepts • K-Nearest Neighbor (kNN) • Distance metric • Pros/Cons of nearest neighbor • Validation, cross-validation, hyperparameter tuning

Supervised learning • Input : 𝑦 (Images, texts, emails) • Output : 𝑧 (e.g., spam or non-spams) • Data : 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 (Labeled dataset) • (Unknown) Target function : 𝑔: 𝑦 → 𝑧 (“True” mapping) • Model/hypothesis : ℎ: 𝑦 → 𝑧 (Learned model) • Learning = search in hypothesis space Slide credit: Dhruv Batra

Training set Learning Algorithm 𝑦 𝑧 ℎ Hypothesis

Regression Training set Learning Algorithm 𝑦 𝑧 ℎ Size of house Hypothesis Estimated price

Classification Training set Learning Algorithm ‘Mug’ 𝑦 𝑧 ℎ Unseen image Predicted object class Hypothesis Image credit: CS231n @ Stanford

Procedural view of supervised learning • Training Stage: • Raw data → 𝑦 (Feature Extraction) • Training data { 𝑦, 𝑧 } → ℎ (Learning) • Testing Stage • Raw data → 𝑦 (Feature Extraction) • Test data 𝑦 → ℎ(𝑦) (Apply function, evaluate error) Slide credit: Dhruv Batra

Basic steps of supervised learning • Set up a supervised learning problem • Data collection: Collect training data with the “right” answer. • Representation: Choose how to represent the data. • Modeling : Choose a hypothesis class: 𝐼 = {ℎ: 𝑌 → 𝑍} • Learning/estimation : Find best hypothesis in the model class. • Model selection: T ry different models. Picks the best one. (More on this later) • If happy stop, else refine one or more of the above Slide credit: Dhruv Batra

Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

Face recognition Image credit: MegaFace

Face recognition – surveillance application

Music identification https://www.youtube.com/watch?v=TKNNOMddkNc

Album recognition (Instance recognition) http://record-player.glitch.me/auth

Scene Completion (C) Dhruv Batra [Hayes & Efros, SIGGRAPH07]

Hays and Efros, SIGGRAPH 2007

… 200 total [Hayes & Efros, SIGGRAPH07]

Context Matching [Hayes & Efros, SIGGRAPH07]

Graph cut + Poisson blending [Hayes & Efros, SIGGRAPH07]

[Hayes & Efros, SIGGRAPH07]

Synonyms • Nearest Neighbors • k-Nearest Neighbors • Member of following families: • Instance-based Learning • Memory-based Learning • Exemplar methods • Non-parametric methods Slide credit: Dhruv Batra

Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

Recall: 1-Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

Distance metrics ( 𝑦 : continuous variables ) • 𝑀 2 -norm: Euclidean distance 𝐸 𝑦, 𝑦 ′ = σ 𝑗 𝑦 𝑗 − 𝑦 𝑗 ′ 2 • 𝑀 1 -norm: Sum of absolute difference 𝐸 𝑦, 𝑦 ′ = σ 𝑗 |𝑦 𝑗 − 𝑦 𝑗 ′| ′ ) 𝐸 𝑦, 𝑦 ′ = max( 𝑦 𝑗 − 𝑦 𝑗 • 𝑀 inf - norm • Scaled Euclidean distance 𝐸 𝑦, 𝑦 ′ = 2 𝑦 𝑗 − 𝑦 𝑗 ′ 2 σ 𝑗 𝜏 𝑗 𝐸 𝑦, 𝑦 ′ = • Mahalanobis distance 𝑦 − 𝑦 ′ ⊤ 𝐵(𝑦 − 𝑦 ′ )

Distance metrics ( 𝑦 : discrete variables ) • Example application: document classification • Hamming distance

Distance metrics ( 𝑦 : Histogram / PDF) • Histogram intersection histint 𝑦, 𝑦 ′ = 1 − ෍ ′ ) min(𝑦 𝑗 , 𝑦 𝑗 𝑗 • Chi-squared Histogram matching distance ′ 2 𝜓 2 𝑦, 𝑦 ′ = 1 𝑦 𝑗 − 𝑦 𝑗 2 ෍ ′ 𝑦 𝑗 + 𝑦 𝑗 𝑗 • Earth mover’s distance (Cross-bin similarity measure) [Rubner et al. IJCV 2000] • minimal cost paid to transform one distribution into the other

Distance metrics ( 𝑦 : gene expression microarray data) • When “shape” matters more than values 𝑦 (2) 𝑦 (1) • Want 𝐸(𝑦 (1) , 𝑦 (2) ) < 𝐸(𝑦 (1) , 𝑦 (3) ) 𝑦 (3) • How? Gene • Correlation Coefficients • Pearson, Spearman, Kendal, etc

Distance metrics ( 𝑦 : Learnable feature) Large margin nearest neighbor (LMNN)

kNN Classification k = 3 k = 5 Image credit: Wikipedia

Classification decision boundaries Image credit: CS231 @ Stanford

Issue: Skewed class distribution • Problem with majority voting in kNN • Intuition: nearby points should be weighted strongly, far points weakly ? • Apply weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 • 𝜏 2 : Kernel width

1-NN for Regression • Just predict the same output as the nearest neighbour. Here, this is the closest datapoint y x Figure credit: Carlos Guestrin

1-NN for Regression • Often bumpy (overfits) Figure credit: Andrew Moore

9-NN for Regression • Predict the averaged of k nearest neighbor values Figure credit: Andrew Moore

Weighting/Kernel functions Weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 Prediction (use all the data) 𝑥 𝑗 𝑧 𝑗 / ෍ 𝑥 (𝑗) 𝑧 = ෍ 𝑗 𝑗 (Our examples use Gaussian) Slide credit: Carlos Guestrin

Effect of Kernel Width • What happens as σ  inf? • What happens as σ  0? Kernel regression Slide credit: Ben Taskar

Problems with Instance-Based Learning • Expensive • No Learning: most real work done during testing • For every test sample, must search through all dataset – very slow! • Must use tricks like approximate nearest neighbour search • Doesn’t work well when large number of irrelevant features • Distances overwhelmed by noisy features • Curse of Dimensionality • Distances become meaningless in high dimensions Slide credit: Dhruv Batra

Curse of dimensionality 2𝑠 • Consider a hypersphere with radius 𝑠 and dimension 𝑒 2𝑠 • Consider hypercube with edge of length 2𝑠 𝑒 = 2 • Distance between center and the corners is 𝑠 𝑒 • Hypercube consist almost entirely of the “corners”

Hyperparameter selection • How to choose K? • Which distance metric should I use? L2, L1? • How large the kernel width 𝜏 2 should be? • ….

Tune hyperparameters on the test dataset? • Will give us a stronger performance on the test set! • Why this is not okay ? Let’s discuss Evaluate on the test set only a single time, at the very end.

Validation set • Spliting training set: A fake test set to tune hyper-parameters Slide credit: CS231 @ Stanford

Cross-validation • 5-fold cross-validation -> split the training data into 5 equal folds • 4 of them for training and 1 for validation Slide credit: CS231 @ Stanford

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine: Install

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors)

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

AI Lab - Session 1 Uninformed Search Riccardo Sartea University of Verona Department of

Distribution Nion Swift Workshop Chris Meyer 2018 October 5 Nion Swift is software for

E9 205 Machine Learning for Signal Processing 01-09-2017 Tutorial on Python Outline Basics

CHSD: Community Health Services Development Program History Offices of Rural Health work

WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Andrea Fioraldi , Daniele Cono

Finding Race Conditions in Kernels from fuzzing to symbolic execution Meng Xu July 16, 2020 Meng

SnaPEA : Predictive Early Activation for Reducing Computation In Deep Convolutional Neural

Acts Series Lesson #94 December 18, 2012 Dean Bible Ministries www.deanbible.org Dr. Robert L.

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine: Install

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors)

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

AI Lab - Session 1 Uninformed Search Riccardo Sartea University of Verona Department of

Distribution Nion Swift Workshop Chris Meyer 2018 October 5 Nion Swift is software for

E9 205 Machine Learning for Signal Processing 01-09-2017 Tutorial on Python Outline Basics

CHSD: Community Health Services Development Program History Offices of Rural Health work

WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Andrea Fioraldi , Daniele Cono

Finding Race Conditions in Kernels from fuzzing to symbolic execution Meng Xu July 16, 2020 Meng

SnaPEA : Predictive Early Activation for Reducing Computation In Deep Convolutional Neural

Acts Series Lesson #94 December 18, 2012 Dean Bible Ministries www.deanbible.org Dr. Robert L.

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures