For Friday No reading Program 3 due Program 3 Any questions? - PowerPoint PPT Presentation

For Friday • No reading • Program 3 due

Program 3 • Any questions?

Basic Concept • Not creating a generalization • Instead, memorizing examples and classifying based on the “closest” example(s) • Advantages? • Disadvantages?

Two Questions to Answer • What does it mean to be close? • How do we classify an example once we know what’s close?

Similarity/Distance Metrics • Instance-based methods assume a function for determining the similarity or distance between any two instances. • For continuous feature vectors, Euclidian distance is the generic choice: n    2 d ( x , x ) ( a ( x ) a ( x )) i j p i p j  p 1 Where a p ( x ) is the value of the p th feature of instance x . • For discrete features, assume distance between two values is 0 if they are the same and 1 if they are different (e.g. Hamming distance for bit vectors). • To compensate for difference in units across features, scale continuous values to the interval [0,1]. 5

Other Distance Metrics • Mahalanobis distance – Scale-invariant metric that normalizes for variance. • Cosine Similarity – Cosine of the angle between the two vectors. – Used in text and other high-dimensional data. • Pearson correlation – Standard statistical correlation coefficient. – Used for bioinformatics data. • Edit distance – Used to measure distance between unbounded length strings. – Used in text and bioinformatics. 6

K-Nearest Neighbor • Find distance to all training examples • Pick k closest • Pick majority class of those • Use odd value of k

5-Nearest Neighbor Example 8

Implicit Classification Function • Although it is not necessary to explicitly calculate it, the learned classification rule is based on regions of the feature space closest to each training example. • For 1-nearest neighbor with Euclidian distance, the Voronoi diagram gives the complex polyhedra segmenting the space into the regions closest to each point. 9

Costs • What’s expensive here? • How do we improve that?

Better Indexing • kd-tree • What’s the idea? • There are other approaches to indexing for other metrics or data types

Nearest Neighbor Variations • Can be used to estimate the value of a real- valued function (regression) by taking the average function value of the k nearest neighbors to an input point. • All training examples can be used to help classify a test instance by giving every training example a vote that is weighted by the inverse square of its distance from the test instance. 12

Feature Relevance and Weighting • Standard distance metrics weight each feature equally when determining similarity. – Problematic if many features are irrelevant, since similarity along many irrelevant examples could mislead the classification. • Features can be weighted by some measure that indicates their ability to discriminate the category of an example, such as information gain. • Overall, instance-based methods favor global similarity over concept simplicity. + Training – Data + ?? Test Instance 13

Rules and Instances in Human Learning Biases • Psychological experiments show that people from different cultures exhibit distinct categorization biases. • “Western” subjects favor simple rules (straight stem) and classify the target object in group 2. • “Asian” subjects favor global similarity and classify the target object in group 1. 14

Other Issues • Can reduce storage of training instances to a small set of representative examples. – Support vectors in an SVM are somewhat analogous. • Can hybridize with rule-based methods or neural-net methods. – Radial basis functions in neural nets and Gaussian kernels in SVMs are similar. • Can be used for more complex relational or graph data. – Similarity computation is complex since it involves some sort of graph isomorphism. • Can be used in problems other than classification. – Case-based planning – Case-based reasoning in law and business. 15

For Friday No reading Program 3 due Program 3 Any questions? - PowerPoint PPT Presentation

For Friday No reading Program 3 due Program 3 Any questions? Basic Concept Not creating a generalization Instead, memorizing examples and classifying based on the closest example(s) Advantages? Disadvantages?

Amazing Android How We Built Square Friday, May 14, 2010 Friday, May 14, 2010 Friday, May 14,

anton@linevich.com http://viewdle.com Friday, July 3, 2009 Friday, July 3, 2009 Friday, July 3,

Friday, 9 May 14 Welcome to your New Project Friday, 9 May 14 Friday, 9 May 14 Friday, 9 May

Beautiful Android by Eric Burke Square Inc. Friday, October 7, 2011 Android Developers?

Sacred Rhy t ms Friday, January 10, 14 Simeon the Stylite Friday, January 10, 14 Friday,

Lecture 27 Logistics HW8 due Friday Ants problem due Friday Ants problem due Friday

Resilient Response in Complex Systems John Allspaw SVP, Tech Ops Friday, March 9, 12

Building a Modern MUD with your host, termie Friday, July 20, 12 Me. Friday, July 20, 12 Me.

Outernauts: From AAA Console to AAA Flash Joe Valenzuela Insomniac Games Friday, April 19, 13

Designing Continuous Delivery Into Your Platform John Simone - Heroku @j_simone Friday,

Scaling Marty Weiner Yashh Nelapati Orodruin, Mordor The Shire Friday, November 9, 12

Lecture 11: Security January 25, 2020 Chris Stone Lab 3 (Bomb) Due 1:15pm Friday Lab 4 (Attack)

Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, August 31, 12 Pavlo Baron

RhinoArm Jeff Caley Friday, December 10, 2010 XR-3 Robotic Arm Friday, December 10, 2010 The

U 4: I

1 Friday, 21 May 2010 2 Friday, 21 May 2010 Energy; Climate; Action: What Next in a World of

XAI in Machine Learning Problems Taxonomy Explanation by Design Black Box eXplanation Example

Course wrap up November 27, 2008 CS 486/686 University of Waterloo Outline Course wrap up

Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk N I V E U

Type-Based Reasoning about Efficiency D. Seidel J. Voigtl ander University of Bonn

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

Navigation Around Humans Hey!! How you Do' in Importance, Approaches and the Future!!! WHY IS

Web Reasoning Using Fact Tagging Mehdi Terdjimi, Lionel Mdini and Michael Mrissa Laboratoire

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit