8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSupélec Paris — Fall 2017 8. Nearest neighbors Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

Practjcal maters ● Class representatjves – William PALMER william.palmer@student.ecp.fr – Léonard BOUSSIOUX leonard.boussioux@student.ecp.fr ● Kaggle project 2

Learning objectjves ● Implement the nearest-neighbor and k-nearest- neighbors algorithms. ● Compute distances between real-valued vectors as well as objects represented by categorical features . ● Defjne the decision boundary of the nearest- neighbor algorithm. ● Explain why kNN might not work well in high dimension . 3

Nearest neighbors 4

● How would you color the blank circles? 5

● How would you color the blank circles? 6

Partjtjoning the space The training data partjtjons the entjre space 7

Nearest neighbor ● Learning: – Store all the training examples ● Predictjon : – For x : the label of the training example closest to it 8

k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon? 9

k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. 10

k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. – Regression? 11

k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. – Regression Predict the average of the labels of the k neighbors. 12

Choice of k ● Small k: noisy The idea behind using more than 1 neighbor is to average out the noise ● Large k: computatjonally intensive If k = n ? 13

Choice of k ● Small k: noisy The idea behind using more than 1 neighbor is to average out the noise ● Large k: computatjonally intensive If k=n, then we predict – for classifjcatjon: the majority class – for regression: the average value ● Set k by cross-validatjon ● Heuristjc: k ≈ √n 14

Non-parametric learning Non-parametric learning algorithm: – the complexity of the decision functjon grows with the number of data points. – contrast with linear regression (≈ as many parameters as features). – Usually: decision functjon is expressed directly in terms of the training examples. – Examples: ● kNN (this chapter) ● tree-based methods (Chap. 9) ● SVM (Chap. 10) 15

Instance-based learning ● Learning: – Storing training instances. ● Predictjng: – Compute the label for a new instance based on its similarity with the stored instances. ● Also called lazy learning. ● Similar to case-based reasoning – Doctors treatjng a patjent based on how patjents with similar symptoms were treated, – Judges ruling court cases based on legal precedent. 16

Instance-based learning ● Learning: – Storing training instances. ● Predictjng: – Compute the label for a new instance based on its similarity with the stored instances. where the magic happens! ● Also called lazy learning. ● Similar to case-based reasoning – Doctors treatjng a patjent based on how patjents with similar symptoms were treated, – Judges ruling court cases based on legal precedent. 17

Computjng distances & similaritjes 18

Distances between instances ● Distance 19

Distances between instances ● Distance 20

Distances between instances ● Euclidean distance 21

Distances between instances ● Euclidean distance ● Manhatan distance Why is this called the Manhatan distance? 22

Distances between instances ● Euclidean distance ● Manhatan distance ● Lq-norm: Minkowski distance – L1 = Manhatuan. – L2 = Euclidean. – L ∞ ? 23

Distances between instances ● Euclidean distance ● Manhatan distance ● Lq-norm: Minkowski distance – L1 = Manhatuan. – L2 = Euclidean. – L ∞ 24

Similarity between instances ● Pearson's correlatjon ● Assuming the data is centered Geometric interpretatjon? 25

Similarity between instances ● Pearson's correlatjon (centered data) ● Cosine similarity: the dot product can be used to measure similaritjes. 26

Categorical features ● Ex: a feature that can take 5 values – Sports – World – Culture – Internet – Politjcs ● Naive encoding: x 1 in {1, 2, 3, 4, 5}: – Why is Sports closer to World than Politjcs? ● One-hot encoding: x 1 , x 2 , x 3 , x 4 , x 5 – Sports: [1, 0, 0, 0, 0] – Internet: [0, 0, 0, 1, 0] 27

Categorical features ● Represent object as the list of presence/absence (or counts) of features that appear in it. ● Example : small molecules features = atoms and bonds of a certain type – C, H, S, O, N... – O-H, O=C, C-N.... 28

Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 no occurrence 1+ occurrences of the 1 st feature of the 10 th feature ● Hamming distance Number of bits that are difgerent Equivalent to ? 29

Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 no occurrence 1+ occurrences of the 1 st feature of the 10 th feature ● Hamming distance Number of bits that are difgerent Equivalent to 30

Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 ● Tanimoto/Jaccard similarity Number of shared features (normalized) 31

Counts representatjon 0 1 2 0 0 1 0 0 0 4 0 1 0 0 7 no occurrence # occurrences of the 1 st feature of the 10 th feature ● MinMax similarity Number of shared features (normalized) If x is binary, MinMax and Tanimoto are equivalent 32

Categorical features ● Features ● Compute the Hamming distance and Tanimoto and MinMax similaritjes between these objects: ? 33

Categorical features ● Features ● Compute the Hamming distance and Tanimoto and MinMax similaritjes between these objects: 100011010110 300011010120 111011011110 211021011120 111011010100 311011010100 34

Categorical features ● A = 100011010110 / 300011010120 ● B = 111011011110 / 211021011120 ● C = 111011010100 / 311011010100 ● Hamming distance d(A, B) = 3 d(A, C) = 3 d(B, C) = 2 ● Tanimoto similarity s(A, B) = 6/9 s(A, C) = 5/8 s(B, C) = 7/9 = 0.67 = 0.63 = 0.78 ● MinMax similarity s(A, B) = 8/13 s(A, C) = 7/11 s(B, C) = 8/13 = 0.62 = 0.64 = 0.62 35

Categorical features ● Features ● When new data has unknown features: ignore them. = 36

Back to nearest neighbors 37

Advantages of kNN ● Training is very fast – Just store the training examples. – Can use smart indexing procedures to speed-up testjng (slower training). ● Keeps the training data – Useful if we want to do something else with it. ● Rather robust to noisy data (averaging k votes) ● Can learn complex functjons 38

Drawbacks of kNN ● Memory requirements ● Predictjon can be slow. – Complexity of labeling 1 new data point ? 39

Drawbacks of kNN ● Memory requirements ● Predictjon can be slow. Complexity of labeling 1 new data point: But kNN works best with lots of samples... → Effjcient data structures ( k-D trees , ball-trees ) ● constructjon space: tjme: ● query: → Approximate solutjons based on hashing ● kNN are fooled by irrelevant atributes. E.g. p=1000, only 10 features are relevant; distances become meaningless. 40

Decision boundary of kNN ● Classifjcatjon ● Decision boundary: Line separatjng the positjve from negatjve regions. ● What decision boundary is the kNN building? 41

Voronoi tesselatjon ● Voronoi cell of x : – set of all points of the space closer to x than any other point of the training set – polyhedron ● Voronoid tesselatjon of the space: union of all Voronoi cells. Draw the ? Voronoi cell of the blue dot. 42

Voronoi tesselatjon ● Voronoi cell of x : – set of all points of the space closer to x than any other point of the training set – polyhedron ● Voronoid tesselatjon of the space: union of all Voronoi cells. 43

Voronoi tesselatjon ● The Voronoi tesselatjon defjnes the decision boundary of the 1-NN. ● The kNN also partjtjons the space (in a more complex way). 44

Curse of dimensionality ● Remember from Chap 3 ● When p ↗ the proportjon of a hypercube outside of its inscribed hypersphere approaches 1. ● Volume of a p-sphere: ● What this means: – hyperspace is very big – all points are far apart – dimensionality reductjon needed. 45

kNN variants ● ε-ball neighbors – Instead of using the k nearest neighbors, use all points within a distance ε of the test point. – What if there are no such points? 46

kNN variants ● Weighted kNN – Weigh the vote of each neighbor according to the distance to the test point. – Variant: learn the optjmal weights [e.g. Swamidass, Azencotu et al. 2009, Infmuence Relevance Voter] 47

8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Paris Fall 2017 8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters Class

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors)

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Private Location-based Query Processing Using PIR Layla Pournajaf Guest Lecture Data Privacy and

Random geometry and convexity Study of random polytopes Pierre Calka 19 October 2016, IHP 2nd

2017 10 21

Lattice structures of multidimensional continued fractions Oleg Karpenkov, University of

An Agent Based Model of Air Tra ffi c Management Stockholm, November 27, 2013 Fabrizio Lillo

Multi-perspective analysis of D4D fine resolution data Movers Gennady & Natalia Andrienko,

Polysilicon MEMS sensors: sensitivity to sub-micron imperfections Aldo Ghisi, Marco V. Geninazzi,

Gaussian Fields and Percolation Dmitry Beliaev Mathematical Institute University of Oxford

8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Paris Fall 2017 8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters Class

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors)

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Private Location-based Query Processing Using PIR Layla Pournajaf Guest Lecture Data Privacy and

Random geometry and convexity Study of random polytopes Pierre Calka 19 October 2016, IHP 2nd

2017 10 21

Lattice structures of multidimensional continued fractions Oleg Karpenkov, University of

An Agent Based Model of Air Tra ffi c Management Stockholm, November 27, 2013 Fabrizio Lillo

Multi-perspective analysis of D4D fine resolution data Movers Gennady &amp; Natalia Andrienko,

Polysilicon MEMS sensors: sensitivity to sub-micron imperfections Aldo Ghisi, Marco V. Geninazzi,

Gaussian Fields and Percolation Dmitry Beliaev Mathematical Institute University of Oxford

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

Multi-perspective analysis of D4D fine resolution data Movers Gennady & Natalia Andrienko,