Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of - PowerPoint PPT Presentation

Nearest-Neighbor Classifier MTL 782 IIT DELHI

Instance-Based Classifiers Set of Stored Cases • Store the training records • Use training records to ……... Atr1 AtrN Class predict the class label of A unseen cases B B Unseen Case C ……... Atr1 AtrN A C B

Instance Based Classifiers • Examples: – Rote-learner • Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly – Nearest neighbor • Uses k “closest” points (nearest neighbors) for performing classification

Nearest Neighbor Classifiers • Basic idea: – If it walks like a duck, quacks like a duck, then it’s probably a duck Compute Test Distance Record Training Choose k of the “nearest” records Records

Nearest-Neighbor Classifiers Unknown record Requires three things l – The set of stored records – Distance Metric to compute distance between records – The value of k , the number of nearest neighbors to retrieve To classify an unknown record: l – Compute distance to other training records – Identify k nearest neighbors – Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

1 nearest-neighbor Voronoi Diagram

Nearest Neighbor Classification • Compute distance between two points: – Euclidean distance    d ( p , q ) ( p q ) 2 i i i – Manhatten distance 𝑒 𝑞, 𝑟 = 𝑞 𝑗 − 𝑟 𝑗 𝑗 – q norm distance ) 1/𝑟 𝑒 𝑞, 𝑟 = ( 𝑞 𝑗 − 𝑟 𝑗 𝑟 𝑗

• Determine the class from nearest neighbor list – take the majority vote of class labels among the k-nearest neighbors y’ = argmax 𝐽( 𝑤 = 𝑧 𝑗 ) 𝒚 𝑗 ,𝑧 𝑗 ϵ 𝐸 𝑨 𝑤 where D z is the set of k closest training examples to z. – Weigh the vote according to distance y’ = argmax 𝑥 𝑗 × 𝐽( 𝑤 = 𝑧 𝑗 ) 𝒚 𝑗 ,𝑧 𝑗 ϵ 𝐸 𝑨 𝑤 • weight factor, w = 1/d 2

The KNN classification algorithm Let k be the number of nearest neighbors and D be the set of training examples. 1. for each test example z = ( x ’,y’) do 2. Compute d( x ’, x ), the distance between z and every example, ( x ,y) ϵ D 3. Select D z ⊆ D, the set of k closest training examples to z. 4. y’ = argmax 𝐽( 𝑤 = 𝑧 𝑗 ) 𝒚 𝑗 ,𝑧 𝑗 ϵ 𝐸 𝑨 𝑤 5. end for

KNN Classification $2,50,000 $2,00,000 $1,50,000 Non-Default Loan$ Default $1,00,000 $50,000 $0 0 10 20 30 40 50 60 70 Age

Nearest Neighbor Classification… • Choosing the value of k: – If k is too small, sensitive to noise points – If k is too large, neighborhood may include points from other classes X

Nearest Neighbor Classification… • Scaling issues – Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes – Example: • height of a person may vary from 1.5m to 1.8m • weight of a person may vary from 60 KG to 100KG • income of a person may vary from Rs10K to Rs 2 Lakh

Nearest Neighbor Classification… • Problem with Euclidean measure: – High dimensional data • curse of dimensionality: all vectors are almost equidistant to the query vector – Can produce undesirable results 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 vs 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 d = 1.4142 d = 1.4142  Solution: Normalize the vectors to unit length

Nearest neighbor Classification… • k-NN classifiers are lazy learners – It does not build models explicitly – Unlike eager learners such as decision tree induction and rule-based systems – Classifying unknown records are relatively expensive

Thank You

Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of - PowerPoint PPT Presentation

Nearest-Neighbor Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of Stored Cases Store the training records Use training records to ... Atr1 AtrN Class predict the class label of A unseen cases B B Unseen

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume

Classifier Classifier Systems Systems

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP

1 How Distribution of Classifier Values Stratified Trial Design: Affect Classifier Performance

Upgrading of VRM & Ball Upgrading of VRM & Ball Mills with LVT Classifier Mills with LVT

An Adaptive Fuzzy ECG Classifier Presented by: Lei W ai Kei Dept. Electronic & Electrical

Machine Learning and Data Mining 2 : Bayes Classifiers Kalev Kask A basic classifier

CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a

Anne Neumann (DIW Berlin, University Potsdam) Maria Nieswand (DIW Berlin) Torben Schubert

CFD Lab Course The Lattice Boltzmann Method Philipp Neumann 20.5.2011 P. Neumann: CFD Lab

PARAMETRIC MODELING OF COMPOSITE LAMINATES Ch. Ghnatios, B. Bognet, A. Leygue, F. Chinesta*, A.

BOAT: Building Auto-Tuners with Structured Bayesian Optimization B esp O ke A uto- T uners Indigo

Classification of species and age composition of forest stands from hyperspectral airborne remote

Lecture 6: Multiple and Poly Linear Regression CS109A Introduction to Data Science Pavlos

Measuring Audible Wind Turbine Noise Measuring Audible Wind Turbine Noise Consultation Session

SIMULTANEOUS GEOMETRIC AND COLORIMETRIC CAMERA CALIBRATION Ilmenau, 7th October 2010 Daniel

Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of - PowerPoint PPT Presentation

Nearest-Neighbor Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of Stored Cases Store the training records Use training records to ... Atr1 AtrN Class predict the class label of A unseen cases B B Unseen

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume

Classifier Classifier Systems Systems

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP

1 How Distribution of Classifier Values Stratified Trial Design: Affect Classifier Performance

Upgrading of VRM &amp; Ball Upgrading of VRM &amp; Ball Mills with LVT Classifier Mills with LVT

An Adaptive Fuzzy ECG Classifier Presented by: Lei W ai Kei Dept. Electronic &amp; Electrical

Machine Learning and Data Mining 2 : Bayes Classifiers Kalev Kask A basic classifier

CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a

Anne Neumann (DIW Berlin, University Potsdam) Maria Nieswand (DIW Berlin) Torben Schubert

CFD Lab Course The Lattice Boltzmann Method Philipp Neumann 20.5.2011 P. Neumann: CFD Lab

PARAMETRIC MODELING OF COMPOSITE LAMINATES Ch. Ghnatios, B. Bognet, A. Leygue, F. Chinesta*, A.

BOAT: Building Auto-Tuners with Structured Bayesian Optimization B esp O ke A uto- T uners Indigo

Classification of species and age composition of forest stands from hyperspectral airborne remote

Lecture 6: Multiple and Poly Linear Regression CS109A Introduction to Data Science Pavlos

Measuring Audible Wind Turbine Noise Measuring Audible Wind Turbine Noise Consultation Session

SIMULTANEOUS GEOMETRIC AND COLORIMETRIC CAMERA CALIBRATION Ilmenau, 7th October 2010 Daniel

Upgrading of VRM & Ball Upgrading of VRM & Ball Mills with LVT Classifier Mills with LVT

An Adaptive Fuzzy ECG Classifier Presented by: Lei W ai Kei Dept. Electronic & Electrical