Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1
Objectives Objectives learning as representation, evaluation and optimization k-nearest neighbors for classification curse of dimensionality manifold hypothesis overfitting & generalization cross validation no free lunch theorem inductive bias 2
A useful perspective on ML A useful perspective on ML Let's focus on classification Learning = Representation + Evaluation + Optimization from: Domingos, Pedro M. "A few useful things to know about machine learning." Commun. acm 55.10 (2012): 78-87. 3 . 1
A useful perspective on ML A useful perspective on ML Let's focus on classification Learning = Representation + Evaluation + Optimization Model Hypothesis space the space of functions to choose from is determined by how we represent/define the learner from: Domingos, Pedro M. "A few useful things to know about machine learning." Commun. acm 55.10 (2012): 78-87. 3 . 1
A useful perspective on ML A useful perspective on ML Let's focus on classification Learning = Representation + Evaluation + Optimization Model Objective function Hypothesis space Cost function Score function the criteria for picking the best model the space of functions to choose from is determined by how we represent/define the learner from: Domingos, Pedro M. "A few useful things to know about machine learning." Commun. acm 55.10 (2012): 78-87. 3 . 1
A useful perspective on ML A useful perspective on ML Let's focus on classification Learning = Representation + Evaluation + Optimization Objective Model Objective function Cost Hypothesis space Cost function Loss Score function procedure for finding the best model the criteria for picking the best model the space of functions to choose from is determined by how we represent/define the learner from: Domingos, Pedro M. "A few useful things to know about machine learning." Commun. acm 55.10 (2012): 78-87. 3 . 1
A useful perspective on ML A useful perspective on ML Let's focus on classification Learning = from: Domingos, Pedro M. "A few useful things to know about machine learning." Commun. acm 55.10 (2012): 78-87. 3 . 2 Winter 2020 | Applied Machine Learning (COMP551)
Digits dataset Digits dataset ( n ) {0, … , 255} 28×28 ∈ input x ( n ) ∈ {0, … , 9} label y indexes the training instance n ∈ {1, … , N } sometime we drop (n) image:https://medium.com/@rajatjain0807/machine-learning-6ecde3bfd2f4 4 . 1
Digits dataset Digits dataset size of the input image in pixels ( n ) {0, … , 255} 28×28 ∈ input x ( n ) ∈ {0, … , 9} label y indexes the training instance n ∈ {1, … , N } sometime we drop (n) image:https://medium.com/@rajatjain0807/machine-learning-6ecde3bfd2f4 4 . 1
Digits dataset Digits dataset size of the input image in pixels ( n ) {0, … , 255} 28×28 ∈ input x ( n ) ∈ {0, … , 9} label y indexes the training instance n ∈ {1, … , N } sometime we drop (n) vectorization: x → vec( x ) ∈ R 784 input dimension D pretending intensities are real numbers image:https://medium.com/@rajatjain0807/machine-learning-6ecde3bfd2f4 4 . 1
Digits dataset Digits dataset size of the input image in pixels ( n ) {0, … , 255} 28×28 ∈ input x ( n ) ∈ {0, … , 9} label y indexes the training instance n ∈ {1, … , N } sometime we drop (n) vectorization: x → vec( x ) ∈ R 784 input dimension D pretending intensities are real numbers note: this ignores the spatial arrangement of pixels, but good enough for now image:https://medium.com/@rajatjain0807/machine-learning-6ecde3bfd2f4 4 . 1
Nearest neighbour classifier Nearest neighbour classifier training: do nothing test: predict the lable by finding the closest image in the training set and closest instance new test instance 4 . 2
Nearest neighbour classifier Nearest neighbour classifier training: do nothing test: predict the lable by finding the closest image in the training set and need a measure of distance closest instance new test instance 4 . 2
Nearest neighbour classifier Nearest neighbour classifier training: do nothing test: predict the lable by finding the closest image in the training set and need a measure of distance closest instance e.g., Euclidean distance ′ D ′ ∣∣ x − x ∣∣ = ( x − x ) 2 new test instance ∑ d =1 2 d d 4 . 2
Nearest neighbour classifier Nearest neighbour classifier training: do nothing test: predict the lable by finding the closest image in the training set and need a measure of distance closest instance e.g., Euclidean distance ′ D ′ ∣∣ x − x ∣∣ = ( x − x ) 2 new test instance ∑ d =1 2 d d test instance: will be classified as 6 Voronoi diagram shows the decision boundaries (this example D=2, can't visualize D=784) 4 . 2
the Voronoi Diagram the Voronoi Diagram each colour shows all points closer to the corresponding training instance than to any other instance images from wiki 4 . 3
the Voronoi Diagram the Voronoi Diagram each colour shows all points closer to the corresponding training instance than to any other instance Euclidean distance Manhattan distance ′ D ′ ′ D ′ ∣∣ x − x ∣∣ = ( x − x ) 2 ∑ d =1 ∣∣ x − x ∣∣ = ∣ x − x ∣ ∑ d =1 2 1 d d d d images from wiki 4 . 3
K- nearest neighbours - nearest neighbours training: do nothing test: predict the lable by finding the K closest instances 1 ∑ x ∈KNN( x ′ I ( y = p ( y = c ∣ x ) = c ) new new ′ ) new K probability of class c K-nearest neighbours example K = 9 6 p ( y = 6∣ ) = 9 closest instances new test instance 4 . 4
K- nearest neighbours K- nearest neighbours training: do nothing test: predict the lable by finding the K closest instances 1 ∑ x ∈KNN( x ′ I ( y = p ( y = c ∣ x ) = c ) new new ′ ) new K probability of class c K-nearest neighbours example C=3, D=2, K=10 training data 4 . 5
K- nearest neighbours K- nearest neighbours training: do nothing test: predict the lable by finding the K closest instances 1 ∑ x ∈KNN( x ′ I ( y = p ( y = c ∣ x ) = c ) new new ′ ) new K probability of class c K-nearest neighbours example C=3, D=2, K=10 prob. of class 1 training data 4 . 5
K- nearest neighbours K- nearest neighbours training: do nothing test: predict the lable by finding the K closest instances 1 ∑ x ∈KNN( x ′ I ( y = p ( y = c ∣ x ) = c ) new new ′ ) new K probability of class c K-nearest neighbours example C=3, D=2, K=10 prob. of class 2 prob. of class 1 training data 4 . 5
K- nearest neighbours K- nearest neighbours a non-parametric method (misnomer) : the number of model parameters grows with the data a lazy-learner : no training phase, locally estimate when a query comes useful for fast-changing datasets 0 1 4 . 6 Winter 2020 | Applied Machine Learning (COMP551)
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! assuming a uniform distribution x ∈ [0, 1] D 5 . 1
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! assuming a uniform distribution x ∈ [0, 1] D need exponentially more instances for K-NN suppose we want to maintain #samples per sub-cube of side 1/3 N (total #training instances) grows expoentially with D (dimensions) 5 . 1
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! x ∈ [0, 1] D assuming a uniform distribution need exponentially more instances for K-NN Another way to see this s s fraction of data in the neighbourhood 5 . 2
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! x ∈ [0, 1] D assuming a uniform distribution need exponentially more instances for K-NN all instances have similar distances 5 . 3
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! x ∈ [0, 1] D assuming a uniform distribution need exponentially more instances for K-NN all instances have similar distances D /2 D 2 r π D Γ( D /2) (2 r ) D D = 3 5 . 3
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! x ∈ [0, 1] D assuming a uniform distribution need exponentially more instances for K-NN all instances have similar distances volum( ) lim = 0 D /2 D 2 r π D →∞ volum( ) D Γ( D /2) most of the volume is close to the corners (2 r ) D most pairwise disstances are similar D = 3 5 . 3
Curse of dimensionality Curse of dimensionality high dimensions are unintuitive! x ∈ [0, 1] D assuming a uniform distribution need exponentially more instances for K-NN all instances have similar distances a "conceptual" visualization of the same example # corners and the mass in the corners grows quickly image: Zaki's book on Data Mining and Analysis 5 . 4
Manifold hypothesis Manifold hypothesis real-world data is often far from uniform manifold hypothesis: real data lies close to the surface of a manifold 5 . 5
Recommend
More recommend