Instance-Based Learning 1. The k-NN algorithm: simple application - PowerPoint PPT Presentation

0. Instance-Based Learning

1. The k-NN algorithm: simple application CMU, 2006 fall, final exam, pr. 2 y Consider the training set x y 3 in the 2-dimensional Eu- − 1 1 − clidean space shown in 0 1 + the nearby table. 0 2 − 2 a. Represent the training 1 − 1 − data in the 2D space. 1 0 + b. What are the pre- 1 1 2 + dictions of the 3- 5- and 2 2 − 7-nearest-neighbor classi- 2 3 + fiers at the point (1,1)? x 0 1 2 3 −1 Solution: −1 b. k = 3 : + ; k = 5 : + ; k = 7 : − .

2. Drawing decision boundaries and decision surfaces for the 1-NN classifier Voronoi Diagrams CMU, 2010 spring, E. Xing, T. Mitchell, A. Singh, HW1, pr. 3.1

3. 4,4 4,4 For each of these figures, we are given a few data points in 2-d space, each of which is labeled as either positive (blue) or negative (red). −4,−4 −4,−4 Assuming that we 4,4 4,4 are using the L2 distance as a distance metric, draw the decision boundary for the 1-NN classifier for each case. −4,−4 −4,−4

4. Solution 4,4 4,4 4,4 4,4 −4,−4 −4,−4 −4,−4 −4,−4

5. 4,4 4,4 4,4 −4,−4 −4,−4 −4,−4

6. Drawing decision boundaries and decision surfaces for the 1-NN classifier Voronoi Diagrams: DO IT YOURSELF CMU, 2010 fall, Ziv Bar-Joseph, HW1, pr. 3.1

7. 2 2 1.5 1.5 For each of the 1 1 nearby figures, you 0.5 0.5 are given negative ( ◦ ) and positive ( + ) 0 0 data points in the −0.5 −0.5 2D space. −1 −1 Remember that a 1- −1.5 −1.5 NN classifier classi- −2 −2 fies a point accord- −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 ing to the class of its 2 2 nearest neighbour. 1.5 1.5 Please draw the Voronoi diagram 1 1 for a 1-NN classifier 0.5 0.5 using Euclidean 0 0 distance as the −0.5 −0.5 distance metric for −1 −1 each case. −1.5 −1.5 −2 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

8. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

9. Decision boundaries and decision surfaces: Comparison between the 1-NN and ID3 classifiers CMU, 2007 fall, Carlos Guestrin, HW2, pr. 1.4

10. For the data in the figure(s) below, sketch the decision surfaces obtained by applying a. the K -Nearest Neighbors algorithm with K = 1 ; b. the ID3 algorithm augmented with [the capacity to process] continous attributes. y y 6 6 5 5 4 4 3 3 2 2 1 1 0 0 x x 0 1 2 3 4 5 6 0 1 2 3 4 5 6

11. Solution: 1-NN y y 6 6 5 5 4 4 3 3 2 2 1 1 0 0 x x 0 1 2 3 4 5 6 0 1 2 3 4 5 6

12. Solution: ID3 y y 6 6 5 5 4 3 5 4 3 3 2 2 2 1 1 0 0 x x 0 1 2 3 4 5 6 0 1 2 3 4 5 6 1 4 0

13. Instance-Based Learning Some important properties

14. k -NN and the Curse of Dimensionality Proving that the number of examples needed by k -NN grows exponentially with the number of features CMU, 2010 fall, Aarti Singh, HW2, pr. 2.2 [ Slides originally drawn by Diana Mˆ ınzat, MSc student, FII, 2015 spring ]

15. Consider a set of n points x 1 , x 2 , ..., x n independently and uniformly drawn from a p -dimensional zero-centered unit ball B = { x : � x � 2 ≤ 1 } ⊂ R p , where � x � = √ x · x and · is the inner product in R p . In this problem we will study the size of the 1-nearest neighbourhood of the origin O and how it changes in relation to the dimension p , thereby gain intuition about the downside of k -NN in a high dimension space. Formally, this size will be described as the distance from O to its nearest neighbour in the set { x 1 , ..., x n } , denoted by d ∗ : d ∗ := min 1 ≤ i ≤ n || x i || , which is a random variable since the sample is random.

16. For p = 1 , calculate P ( d ∗ ≤ t ) , the cumulative distribution a. function (c.d.f.) of d ∗ , for t ∈ [0 , 1] . Solution: In the one-dimensional space ( p = 1 ), the unit ball is the interval [ − 1 , 1] . The cumulative distribution function will have the following expression: = P ( d ∗ ≤ t ) = 1 − P ( d ∗ > t ) = 1 − P ( | x i | > t, for i = 1 , 2 , ..., n ) not. F n, 1 ( t ) Because the points x 1 , ..., x n were generated independently, the c.d.f. can also be written as: n � P ( | x i | > t ) = 1 − (1 − t ) n F n, 1 ( t ) = 1 − i =1

17. b. Find the formula of the cumulative distribution function of d ∗ for the general case, when p ∈ { 1 , 2 , 3 , ... } . Hint: You may find the following fact useful: the volume of a p -dimensional ball with radius r is ( r √ π ) p V p ( r ) = � , � p Γ 2 + 1 where Γ is Euler’s Gamma function, defined by = √ π, Γ(1) = 1 , and Γ( x + 1) = x Γ( x ) for any x > 0 . � 1 � Γ 2 Note: It can be easily shown that Γ( n + 1) = n ! for all n ∈ N ∗ , therefore the Gamma function is a generalization of the factorial function.

18. Solution: In the general case, i.e. considering a fixed p ∈ N ∗ , it is obvious that the cumulative distribution function of d ∗ will have a similar form to the p = 1 case: = P ( d ∗ ≤ t ) 1 − P ( d ∗ > t ) = 1 − P ( || x i || > t, i = 1 , 2 , . . . , n ) not. F n,p ( t ) = n � = 1 − P ( || x i || > t ) . i =1 Denoting the volume of the sphere of radius t by V p ( t ) , and knowing that the points x 1 , ..., x n follow a uniform distribution, we can rewrite the above formula as follows: � n � n � V p (1) − V p ( t ) � 1 − V p ( t ) F n,p ( t ) = 1 − = 1 − . V p (1) V p (1) Using the suggested formula for the volume of the sphere, it follows im- mediately that F n,p = 1 − (1 − t p ) n .

19. c. What is the median of the random variable d ∗ (i.e., the value of t for which P ( d ∗ ≤ t ) = 1 / 2 ) ? The answer should be a function of both the sample size n and the dimension p . Fix n = 100 and plot the values of the median function for p = 1 , 2 , 3 , ..., 100 with the median values on the y -axis and the values of p on the x -axis. What do you see? Solution: In order to find the median value of the random variable d ∗ , we will solve the equation P ( d ∗ ≤ t ) = 1 / 2 of variable t : P ( d ∗ ≤ t ) = 1 F n,p ( t ) = 1 ⇔ 1 − (1 − t p ) n = 1 2 ⇔ (1 − t p ) n = 1 b ⇔ 2 2 2 1 1 1 − t p = 2 1 /n ⇔ t p = 1 − ⇔ 2 1 /n Therefore, � 1 /p � 1 t med ( n, p ) = 1 − . 2 1 /n

20. The plot of the function t med (100 , p ) for p = 1 , 2 , . . ., 100 : 1 Remark: 0.8 The minimal sphere contain- ing the nearest neighbour of the origin in the set 0.6 t med (100,p) { x 1 , x 2 , ..., x n } grows very fast as the value of p increases. 0.4 When p becomes greater than 10, most of the 100 training instances are closer 0.2 to the surface of the unit ball than to the origin O . 0 0 20 40 60 80 100 p

21. d. Use the c.d.f. derived at point b to determine how large should the sample size n be such that with probability at least 0.9, the distance d ∗ from O to its nearest neighbour is less than 1 / 2 , i.e., half way from O to the boundary of the ball. The answer should be a function of p . Plot this function for p = 1 , 2 , . . . , 20 with the function values on the y -axis and values of p on the x -axis. What do you see? Hint : You may find useful the Taylor series expansion of ln(1 − x ) : ∞ x i � ln(1 − x ) = − i for − 1 ≤ x < 1 . i =1

22. Solution: � n � n F n,p (0 . 5) ≥ 9 � 1 − 1 ≥ 9 � 1 − 1 ≤ 1 P ( d ∗ ≤ 0 . 5) ≥ 0 . 9 b. ⇔ ⇔ 1 − 10 ⇔ 2 p 2 p 10 10 � 1 − 1 � ln 10 ⇔ n · ln ≤ − ln 10 ⇔ n ≥ 2 p � 1 − 1 � − ln 2 p Using the decomposition of ln(1 − 1 / 2 p ) into a Taylor series (with x = 1 / 2 p ), we obtain: P ( d ∗ ≤ 0 . 5) ≥ 0 . 9 1 n ≥ (ln 10) 2 p ⇒ 1 + 1 2 · 1 2 p + 1 3 · 1 2 2 p + . . . + 1 1 2 ( n − 1) p + . . . n n ≥ 2 p − 1 ln 10 . ⇒

23. Note : In order to obtain the last inequality in the above calculations, we considered the following two facts: 3 · 2 p < 1 1 i. 4 holds for any p ≥ 1 , and n · 2 ( n − 1) p ≤ 1 1 2 n ⇔ 2 n ≤ n · 2 ( n − 1) p holds for any p ≥ 1 and n ≥ 2 . ii. (This can be proven by induction on p ). So, we got: 1 + 1 2 · 1 2 p + 1 3 · 1 2 2 p + . . . + 1 1 2 ( n − 1) p + . . . < n 1 + 1 2 + 1 4 + . . . + 1 1 2 n + . . . → = 2 . 1 − 1 2

24. 2.5 2 The proven result -(ln 10 / ln(1-2 -p )) / 10 6 P ( d ∗ ≤ 0 . 5) ≥ 0 . 9 ⇒ n ≥ 2 p − 1 ln 10 1.5 means that the sample size needed for the probability that d ∗ < 0 . 5 is 1 large enough (9/10) grows exponentially with p . 0.5 0 0 5 10 15 20 p

25. e. Having solved the previous problems, what will you say about the downside of k -NN in terms of n and p ? Solution: The k -NN classifier works well when a test instance has a “dense” neighbourhood in the training data. However, the analysis here suggests that in order to provide a dense neighbourhood, the size of the training sample should be exponential in the dimension p , which is clearly infeasible for a large p . (Remember that p is the dimension of the space we work in, i.e. the number of features of the training instances.)

Instance-Based Learning 1. The k-NN algorithm: simple application - PowerPoint PPT Presentation

0. Instance-Based Learning 1. The k-NN algorithm: simple application CMU, 2006 fall, final exam, pr. 2 y Consider the training set x y 3 in the 2-dimensional Eu- 1 1 clidean space shown in 0 1 + the nearby table. 0 2

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

Learning for Categorization Sample Category Learning Problem A training example is an instance

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

High Luminosity ATLAS vs. CMOS Sensors Where we currently are and where wed like to be Jens

The ATLAS tracker upgrade towards the SLHC era 5 Collisions (0.2 x 10 34 cm -2 s -1 ) 400

VERITAS Observations of Relativistic Jets Reshmi Mukherjee 1 for the VERITAS Collaboration 1

Recent results on 3D double sided detectors at IMB-CNM G. Pellegrini, C. Fleta, M. Lozano, D.

CS 89.15/189.5, F ALL 2015 Wojciech Jarosz wojciech.k.jarosz@dartmouth.edu Results from

Learning in Parallel Universes Bernd Wiswedel 15 September, 2008 Overview What are Parallel

Integrated STEM K-12 Curriculum Development for Mainstream Adoption William Hunter Brad

E-beam and X-ray: Why? What? How? 1 Introduction & Agenda Agenda: We are investing to

Instance-Based Learning 1. The k-NN algorithm: simple application - PowerPoint PPT Presentation

0. Instance-Based Learning 1. The k-NN algorithm: simple application CMU, 2006 fall, final exam, pr. 2 y Consider the training set x y 3 in the 2-dimensional Eu- 1 1 clidean space shown in 0 1 + the nearby table. 0 2

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

Learning for Categorization Sample Category Learning Problem A training example is an instance

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO &amp; PARTNERS IS THE

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

High Luminosity ATLAS vs. CMOS Sensors Where we currently are and where wed like to be Jens

The ATLAS tracker upgrade towards the SLHC era 5 Collisions (0.2 x 10 34 cm -2 s -1 ) 400

VERITAS Observations of Relativistic Jets Reshmi Mukherjee 1 for the VERITAS Collaboration 1

Recent results on 3D double sided detectors at IMB-CNM G. Pellegrini, C. Fleta, M. Lozano, D.

CS 89.15/189.5, F ALL 2015 Wojciech Jarosz wojciech.k.jarosz@dartmouth.edu Results from

Learning in Parallel Universes Bernd Wiswedel 15 September, 2008 Overview What are Parallel

Integrated STEM K-12 Curriculum Development for Mainstream Adoption William Hunter Brad

E-beam and X-ray: Why? What? How? 1 Introduction &amp; Agenda Agenda: We are investing to

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE

E-beam and X-ray: Why? What? How? 1 Introduction & Agenda Agenda: We are investing to