k nearest neighbors
play

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine: Install


  1. K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative • Check out review materials • Probability • Linear algebra • Python and NumPy • Start your HW 0 • On your Local machine: Install Anaconda, Jupiter notebook • On the cloud: https://colab.research.google.com • Sign up Piazza discussion forum

  3. Enrollment • Maximum allowable capacity reached. Students Classroom

  4. Machine learning reading&study group • Reading Group Tuesday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Research paper reading: machine learning, computer vision • Study Group Thursday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Video lecture: machine learning All are welcome. More info: https://github.com/vt-vl-lab/reading_group

  5. Recap: Machine learning algorithms Supervised Unsupervised Learning Learning Discrete Classification Clustering Dimensionality Continuous Regression reduction

  6. Today’s plan • Supervised learning • Setup • Basic concepts • K-Nearest Neighbor (kNN) • Distance metric • Pros/Cons of nearest neighbor • Validation, cross-validation, hyperparameter tuning

  7. Supervised learning • Input : 𝑦 (Images, texts, emails) • Output : 𝑧 (e.g., spam or non-spams) • Data : 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 (Labeled dataset) • (Unknown) Target function : 𝑔: 𝑦 → 𝑧 (“True” mapping) • Model/hypothesis : ℎ: 𝑦 → 𝑧 (Learned model) • Learning = search in hypothesis space Slide credit: Dhruv Batra

  8. Training set Learning Algorithm 𝑦 𝑧 ℎ Hypothesis

  9. Regression Training set Learning Algorithm 𝑦 𝑧 ℎ Size of house Hypothesis Estimated price

  10. Classification Training set Learning Algorithm ‘Mug’ 𝑦 𝑧 ℎ Unseen image Predicted object class Hypothesis Image credit: CS231n @ Stanford

  11. Procedural view of supervised learning • Training Stage: • Raw data → 𝑦 (Feature Extraction) • Training data { 𝑦, 𝑧 } → ℎ (Learning) • Testing Stage • Raw data → 𝑦 (Feature Extraction) • Test data 𝑦 → ℎ(𝑦) (Apply function, evaluate error) Slide credit: Dhruv Batra

  12. Basic steps of supervised learning • Set up a supervised learning problem • Data collection: Collect training data with the “right” answer. • Representation: Choose how to represent the data. • Modeling : Choose a hypothesis class: 𝐼 = {ℎ: 𝑌 → 𝑍} • Learning/estimation : Find best hypothesis in the model class. • Model selection: T ry different models. Picks the best one. (More on this later) • If happy stop, else refine one or more of the above Slide credit: Dhruv Batra

  13. Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

  14. Face recognition Image credit: MegaFace

  15. Face recognition – surveillance application

  16. Music identification https://www.youtube.com/watch?v=TKNNOMddkNc

  17. Album recognition (Instance recognition) http://record-player.glitch.me/auth

  18. Scene Completion (C) Dhruv Batra [Hayes & Efros, SIGGRAPH07]

  19. Hays and Efros, SIGGRAPH 2007

  20. … 200 total [Hayes & Efros, SIGGRAPH07]

  21. Context Matching [Hayes & Efros, SIGGRAPH07]

  22. Graph cut + Poisson blending [Hayes & Efros, SIGGRAPH07]

  23. [Hayes & Efros, SIGGRAPH07]

  24. [Hayes & Efros, SIGGRAPH07]

  25. [Hayes & Efros, SIGGRAPH07]

  26. [Hayes & Efros, SIGGRAPH07]

  27. [Hayes & Efros, SIGGRAPH07]

  28. [Hayes & Efros, SIGGRAPH07]

  29. Synonyms • Nearest Neighbors • k-Nearest Neighbors • Member of following families: • Instance-based Learning • Memory-based Learning • Exemplar methods • Non-parametric methods Slide credit: Dhruv Batra

  30. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  31. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  32. Recall: 1-Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

  33. Distance metrics ( 𝑦 : continuous variables ) • 𝑀 2 -norm: Euclidean distance 𝐸 𝑦, 𝑦 ′ = σ 𝑗 𝑦 𝑗 − 𝑦 𝑗 ′ 2 • 𝑀 1 -norm: Sum of absolute difference 𝐸 𝑦, 𝑦 ′ = σ 𝑗 |𝑦 𝑗 − 𝑦 𝑗 ′| ′ ) 𝐸 𝑦, 𝑦 ′ = max( 𝑦 𝑗 − 𝑦 𝑗 • 𝑀 inf - norm • Scaled Euclidean distance 𝐸 𝑦, 𝑦 ′ = 2 𝑦 𝑗 − 𝑦 𝑗 ′ 2 σ 𝑗 𝜏 𝑗 𝐸 𝑦, 𝑦 ′ = • Mahalanobis distance 𝑦 − 𝑦 ′ ⊤ 𝐵(𝑦 − 𝑦 ′ )

  34. Distance metrics ( 𝑦 : discrete variables ) • Example application: document classification • Hamming distance

  35. Distance metrics ( 𝑦 : Histogram / PDF) • Histogram intersection histint 𝑦, 𝑦 ′ = 1 − ෍ ′ ) min(𝑦 𝑗 , 𝑦 𝑗 𝑗 • Chi-squared Histogram matching distance ′ 2 𝜓 2 𝑦, 𝑦 ′ = 1 𝑦 𝑗 − 𝑦 𝑗 2 ෍ ′ 𝑦 𝑗 + 𝑦 𝑗 𝑗 • Earth mover’s distance (Cross-bin similarity measure) [Rubner et al. IJCV 2000] • minimal cost paid to transform one distribution into the other

  36. Distance metrics ( 𝑦 : gene expression microarray data) • When “shape” matters more than values 𝑦 (2) 𝑦 (1) • Want 𝐸(𝑦 (1) , 𝑦 (2) ) < 𝐸(𝑦 (1) , 𝑦 (3) ) 𝑦 (3) • How? Gene • Correlation Coefficients • Pearson, Spearman, Kendal, etc

  37. Distance metrics ( 𝑦 : Learnable feature) Large margin nearest neighbor (LMNN)

  38. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  39. kNN Classification k = 3 k = 5 Image credit: Wikipedia

  40. Classification decision boundaries Image credit: CS231 @ Stanford

  41. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  42. Issue: Skewed class distribution • Problem with majority voting in kNN • Intuition: nearby points should be weighted strongly, far points weakly ? • Apply weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 • 𝜏 2 : Kernel width

  43. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  44. 1-NN for Regression • Just predict the same output as the nearest neighbour. Here, this is the closest datapoint y x Figure credit: Carlos Guestrin

  45. 1-NN for Regression • Often bumpy (overfits) Figure credit: Andrew Moore

  46. 9-NN for Regression • Predict the averaged of k nearest neighbor values Figure credit: Andrew Moore

  47. Weighting/Kernel functions Weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 Prediction (use all the data) 𝑥 𝑗 𝑧 𝑗 / ෍ 𝑥 (𝑗) 𝑧 = ෍ 𝑗 𝑗 (Our examples use Gaussian) Slide credit: Carlos Guestrin

  48. Effect of Kernel Width • What happens as σ  inf? • What happens as σ  0? Kernel regression Slide credit: Ben Taskar

  49. Problems with Instance-Based Learning • Expensive • No Learning: most real work done during testing • For every test sample, must search through all dataset – very slow! • Must use tricks like approximate nearest neighbour search • Doesn’t work well when large number of irrelevant features • Distances overwhelmed by noisy features • Curse of Dimensionality • Distances become meaningless in high dimensions Slide credit: Dhruv Batra

  50. Curse of dimensionality 2𝑠 • Consider a hypersphere with radius 𝑠 and dimension 𝑒 2𝑠 • Consider hypercube with edge of length 2𝑠 𝑒 = 2 • Distance between center and the corners is 𝑠 𝑒 • Hypercube consist almost entirely of the “corners”

  51. Hyperparameter selection • How to choose K? • Which distance metric should I use? L2, L1? • How large the kernel width 𝜏 2 should be? • ….

  52. Tune hyperparameters on the test dataset? • Will give us a stronger performance on the test set! • Why this is not okay ? Let’s discuss Evaluate on the test set only a single time, at the very end.

  53. Validation set • Spliting training set: A fake test set to tune hyper-parameters Slide credit: CS231 @ Stanford

  54. Cross-validation • 5-fold cross-validation -> split the training data into 5 equal folds • 4 of them for training and 1 for validation Slide credit: CS231 @ Stanford

Recommend


More recommend