Machine Learning Instance Based Learning Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Instance Based Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 17

Table of contents Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 2 / 17

Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 17

Introduction The methods described before such as decision tree, Bayesian classifiers, and 1 boosting, at the first find hypothesis and then this hypothesis will be used for classification of new test examples. These methods are called eager learning. 2 The instance based learning algorithms such as k-NN store all of the training 3 examples and then classify a new example x by finding the training example ( x i , y i ) that is nearest to x according to some distance metric. Instance based classifiers do not explicitly compute decision boundaries. However, 4 the boundaries form a subset of the Voronoi diagram of the training data. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 17

Nearest neighbor algorithms Fix k ≥ 1, given a labeled sample 1 S = { ( x 1 , t 1 ) , . . . , ( x N , t N ) } where t i ∈ { 0 , 1 } . The k -NN for all test examples x returns the hypothesis h defined by    � �  . h ( x ) = I w i > w i i , t i =1 i , t i =0 where the weights w 1 , . . . , w N are chosen such that w i = 1 k if x i is among the k nearest neighbors of x . The boundaries form a subset of the Voronoi diagram of the training data. 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 4 / 17

Nearest neighbor algorithms The k -NN only requires 1 An integer k . A set of labeled examples S . A metric to measure closeness. For all points x , y , z , a metric d must satisfy the following properties. 2 Non-negativity : d ( x , y ) ≥ 0. Reflexivity : d ( x , y ) = 0 ⇔ x = y . Symmetry : d ( x , y ) = d ( y , x ). Triangle inequality : d ( x , y ) + d ( y , z ) ≥ d ( x , z ). Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 5 / 17

Distance functions The Minkowski distance for D -dimensional examples is the L p norm. 1 � D � 1 p � | x i − y i | p L p ( x , y ) = i =1 The Euclidean distance is the L 2 norm 2 � 1 � D 2 � | x i − y i | 2 L 2 ( x , y ) = i =1 The Manhattan or city block distance is the L 2 norm 3 D � L 1 ( x , y ) = | x i − y i | i =1 The L ∞ norm is the maximum of distances along axes 4 L ∞ ( x , y ) = max | x i − y i | i Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 6 / 17

�� Nearest neighbor algorithm for regression The k -NN algorithm adapted for approximating continuous-valued target function. 1 We calculate the mean of k nearest neighborhood training examples rather than 2 � k i =1 f ( x i ) majority vote : ˆ f ( x ) = . k The effect of k on the performance of algorithm 1 3 1 Pictures are taken from P. Rai slide. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 7 / 17

Nearest neighbor algorithms The k -NN algorithm is a lazy learning algorithm. 1 It defers the hypothesis finding until a test example x arrives. For test example x , the k -NN uses the stored training data. Discards the the found hypothesis and any intermediate results. This strategy is opposed to an eager learning algorithm which 2 It finds a hypothesis h using the training set It uses the found hypothesis h for classification of test example x . Trade offs 3 During training phase, lazy algorithms have fewer computational costs than eager algorithms. During testing phase, lazy algorithms have greater storage requirements and higher computational costs. What is inductive bias of k -NN? 4 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 8 / 17

Properties of nearest neighbor algorithms Advantages 1 Analytically tractable Simple implementation Use local information, which results in highly adaptive behavior. It parallel implementation is very easy. Nearly optimal in the large sample ( N → ∞ ). E ( Bayes ) < E ( NN ) < 2 × E ( Bayes ) . Disadvantages 2 Large storage requirements. It needs a high computational cost during testing. Highly susceptible to the irrelevant features. Large values of k 3 Results in smoother decision boundaries. Provides more accurate probabilistic information But large values of k 4 Increases computational cost. Destroys the locality of estimation. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 9 / 17

Distance-weighted nearest neighbor algorithms One refinement of k -NN is to weight the contribution of each k neighbors to their 1 distance to the query point x . For two class classification 2    � �  . h ( x ) = I w i > w i i , t i =1 i , t i =0 where 1 w i = d ( x , x i ) 2 For C class classification 3 k � h ( x ) = argmax w i δ ( c , t i ) . c ∈ C i =1 For regression 4 � k i =1 w i f ( x i ) ˆ f ( x ) = . w i Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 10 / 17

Locally weighted regression In locally weighted regression (LWR), we use a linear model to do the local 1 approximation ˆ f : ˆ f ( x ) = w 0 + w 1 x 1 + w 2 x 2 + . . . + w D x D . Suppose we aim to minimize the total squared error: 2 E = 1 � ( f ( x ) − ˆ f ( x )) 2 2 x ∈ S Using gradient descent 3 � ( f ( x ) − ˆ ∆ w j = η f ( x )) x j x ∈ S where η is a small number (the learning rate). Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 11 / 17

Locally weighted regression I How shall we modify this procedure to derive a local approximation rather than a 1 global one? The simple way is to redefine the error criterion E to emphasize fitting the local 2 training examples. Three possible criteria are given below. Note we write the error E ( x q ) to emphasize 3 the fact that now the error is being defined as a function of the query point x q . Minimize the squared error over just the k nearest neighbors: E 1 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 2 x ∈ KNN ( x q ) Minimize 1 squared error over the set S of training examples, while weighting the error of each training example by some decreasing function k of its distance from x q E 2 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 K ( d ( x q , x )) 2 x ∈ S Combine 1 and 2: E 3 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 K ( d ( x q , x )) 2 x ∈ KNN ( x q ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 12 / 17

Locally weighted regression II If we choose criterion three above and re-derive the gradient descent rule, we obtain 4 K ( d ( x q , x ))( f ( x ) − ˆ � ∆ w j = η f ( x )) x j x ∈ KNN ( x q ) where η is a small number (the learning rate). Criterion two is perhaps the most esthetically pleasing because it allows every 5 training example to have an impact on the classification of x q . However, this approach requires computation that grows linearly with the number 6 of training examples. Criterion (3) is a good approximation to criterion (2) and has the advantage that 7 computational cost is independent of the total number of training examples; its cost depends only on the number k of neighbors considered. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 13 / 17

Machine Learning Instance Based Learning Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Instance Based Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 17 Table of contents Introduction 1 Nearest neighbor algorithms

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

RSESLIB 3: Rough Set and Machine Learning Open Source in Java Agenda Overview Library

Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Draft Community Draft Community Engagement Strategy Engagement Strategy Developed by The

ClusterPCAML November 13, 2018 1 Lecture 23: Clustering and machine learning CBIO (CSCI)

w o o o o o o o x o o o x o o o that represents how aligned the x x x x x x

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

ADVANCED MACHINE LEARNING Caveats and Techniques to Deal with Imbalanced Datasets (Adapted from

Test Case Software

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning Instance Based Learning Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Instance Based Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 17 Table of contents Introduction 1 Nearest neighbor algorithms

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

RSESLIB 3: Rough Set and Machine Learning Open Source in Java Agenda Overview Library

Distances &amp; Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Draft Community Draft Community Engagement Strategy Engagement Strategy Developed by The

ClusterPCAML November 13, 2018 1 Lecture 23: Clustering and machine learning CBIO (CSCI)

w o o o o o o o x o o o x o o o that represents how aligned the x x x x x x

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

ADVANCED MACHINE LEARNING Caveats and Techniques to Deal with Imbalanced Datasets (Adapted from

Test Case Software

Sambuz

Useful Links

Newsletter

Mail Us

Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University