BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS - PowerPoint PPT Presentation

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch Tuesday, January 21, 2020 1

LOGISTICS LOGISTICS TAs and Office hours Monday: Mehrdad (TSRB 523a) - 2pm-3:15pm Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Homework 1 posted on Canvas Due Wednesday January 29, 2020 (11:59PM EST) (Wednesday February 5, 2020 for DL) 2

RECAP: BAYES CLASSIFIER RECAP: BAYES CLASSIFIER What is the best risk (smallest) that we can achieve? Assume that we actually know and P X P Y | X Denote the a posteriori class probabilities of by x ∈ X η k ( x ) ≜ P ( Y = k | X = x ) Denote the a priori class probabilities by π k ≜ P ( Y = k ) Lemma (Bayes classifier) The classifier is optimal, i.e., for any classifier , we have h B ( x ) ≜ argmax k ∈[0; K −1] η k ( x ) h . h B R ( ) ≤ R ( h ) h B R ( ) = [ 1 − ( X ) ] E X max η k k Terminology is called the Bayes classifier h B is called the Bayes risk h B R B ≜ R ( ) 3

OTHER FORMS OF THE BAYES CLASSIFIER OTHER FORMS OF THE BAYES CLASSIFIER h B ( x ) ≜ argmax k ∈[0; K −1] η k ( x ) h B ( x ) ≜ argmax k ∈[0; K −1] π k p X | Y ( x | k ) For (binary classification): log-likelihood ratio test K = 2 ( x |1) p X | Y π 0 log ≷ log p X | Y ( x |0) π 1 If all classes are equally likely π 0 = π 1 = ⋯ = π K −1 h B ( x ) ≜ argmax p X | Y ( x | k ) k ∈[0; K −1] Example (Bayes classifier) Assume and . The Bayes risk for is X | Y = 0 ∼ N (0, 1) X | Y = 1 ∼ N (1, 1) π 0 = π 1 with 1 h B R ( ) = Φ(− ) Φ ≜ Normal CDF 2 In practice we do not now and P X P Y | X Plugin methods : use the data to learn the distributions and plug result in Bayes classifier 5

OTHER LOSS FUNCTIONS OTHER LOSS FUNCTIONS We have focused on the risk obtained for a binary loss function P ( h ( X ) ≠ Y ) 1 { h ( X ) ≠ Y } There are many situations in which this is not appropriate Cost sensitive classification: false alarm and missed detection may not be equivalent c 0 1 { h ( X ) ≠ 0 and Y = 0} + c 1 1 { h ( X ) ≠ 1 and Y = 1} Unbalanced data set: the probability of the largest class will dominate More to explore in the next homework! 6

NEAREST NEIGHBOR CLASSIFIER NEAREST NEIGHBOR CLASSIFIER Back to our training dataset D ≜ {( x 1 y 1 , ), ⋯ , ( x N y N , )} The nearest-neighbor (NN) classifier is where h NN ( x ) ≜ y NN( x ) NN( x ) ≜ argmin i ∥ − x ∥ x i Risk of NN classifier conditioned on and x x NN( x ) R NN ( x , x NN( x ) ) = ∑ η k x NN( x ) ( )(1 − η k ( x )) = ∑ η k ( x )(1 − η k x NN( x ) ( )). k k How well does the average risk compare to the Bayes risk for large ? h NN R NN = R ( ) N Lemma. Let , be i.i.d. in a separable metric space . Let be the nearest neighbor of . x { x i } N ∼ P x X x x NN( x ) i =1 Then with probability one as x NN( x ) → x N → ∞ Theorem (Binary NN classifier) Let be a separable metric space. Let , be such that, with probability one, is X p ( x | y = 0) p ( x | y = 1) x either a continuity point of and or a point of non-zero probability measure. p ( x | y = 0) p ( x | y = 1) Then, as , N → ∞ h B h NN h B h B R ( ) ≤ R ( ) ≤ 2 R ( )(1 − R ( )) 7

K NEAREST NEIGHBOR CLASSIFIER K NEAREST NEIGHBOR CLASSIFIER Can drive the risk of the NN classifier to the Bayes risk by increasing the size of the neighborhood Assign label to by taking majority vote among nearest neighbors h K -NN x K − − − 2 h K -NN h B √ N →∞ E lim [ R ( ) ] ≤ ( 1 + ) R ( ) K Definition. Let be a classifier learned from a set of data points. The classifier is consistent if ^ N ^ N h N h as . ^ N [ R ( ) ] → N → ∞ E h R B Theorem (Stone's Theorem) If , , , then is consistent h K -NN N → ∞ K → ∞ K / N → 0 Choosing is a problem of model selection K Do not choose by minimizing the empirical risk on training: K N 1 ˆ N h 1-NN R ( ) = N ∑ 1 { h 1 x i ( ) = y i } = 0 i =1 Need to rely on estimates from model selection techniques (more later!) 10

K NEAREST NEIGHBOR CLASSIFIER K NEAREST NEIGHBOR CLASSIFIER Given enough data, a -NN classifier will do just as well as pretty much any other method K The number of samples can be huge (especially in high-dimension) N The choice of matters a lot, model selection is important K Finding the nearest neighbors out of a millions of datapoints is still computationally hard -d trees help, but still expensive in high dimension when K N ≈ d We will discuss other classifiers that make more assumptions about the underlying data 11

   12

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS - PowerPoint PPT Presentation

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch Tuesday, January 21, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Monday: Mehrdad (TSRB 523a) - 2pm-3:15pm Tuesday: TJ (VL C449 Cubicle D) -

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M.

Nearest Neighbor Classification Seed classification by area and What should we compactness

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Meeting 22: 9 August 2016 1 Karakia 2 Agenda 10:00am Welcome, karakia, notices, meeting

An Introduction to BORPH Hayden Kwok-Hay So University of Hong Kong Aug 2, 2008 CASPER

Meaning of temperature in different thermostatistical ensembles Peter Hnggi Universitt

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory

Clausal Tableaux and Linear Strategies Clausal Tableaux use only clausal sentences Clause

BC - Schedule and Planning Bob Hirosky - University of Virginia Name, Aug 29/30 2017 FNAL

Impa c t of dia g nosis with Pa nc re a tic Ne uroe ndoc rine T umors on Morta lity in Pa tie

Implementation of Chat Application for Ginga Middleware Technology Using Second Screen Luis