10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1
Q&A Q: How do we deal with ties in k-Nearest Neighbors (e.g. even k or equidistant points)? A: I would ask you all for a good solution! Q: How do we define a distance function when the features are categorical (e.g. weather takes values {sunny, rainy, overcast})? A: Step 1: Convert from categorical attributes to numeric features (e.g. binary) Step 2: Select an appropriate distance function (e.g. Hamming distance) 2
Reminders • Homework 2: Decision Trees – Out: Wed, Jan 24 – Due: Mon, Feb 5 at 11:59pm • 10601 Notation Crib Sheet 3
K-NEAREST NEIGHBORS 7
k-Nearest Neighbors Chalkboard: – KNN for binary classification – Distance functions – Efficiency of KNN – Inductive bias of KNN – KNN Properties 8
KNN ON FISHER IRIS DATA 9
Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Petal Petal Length Width Length Width 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 10 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Deleted two of the Length Width four features, so that 0 4.3 3.0 input space is 2D 0 4.9 3.6 0 5.3 3.7 1 4.9 2.4 1 5.7 2.8 1 6.3 3.3 1 6.7 3.0 11 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
KNN on Fisher Iris Data 12
KNN on Fisher Iris Data Special Case: Nearest Neighbor 13
KNN on Fisher Iris Data Special Case: Majority Vote 14
KNN on Fisher Iris Data 15
KNN on Fisher Iris Data Special Case: Nearest Neighbor 16
KNN on Fisher Iris Data 17
KNN on Fisher Iris Data 18
KNN on Fisher Iris Data 19
KNN on Fisher Iris Data 20
KNN on Fisher Iris Data 21
KNN on Fisher Iris Data 22
KNN on Fisher Iris Data 23
KNN on Fisher Iris Data 24
KNN on Fisher Iris Data 25
KNN on Fisher Iris Data 26
KNN on Fisher Iris Data 27
KNN on Fisher Iris Data 28
KNN on Fisher Iris Data 29
KNN on Fisher Iris Data 30
KNN on Fisher Iris Data 31
KNN on Fisher Iris Data 32
KNN on Fisher Iris Data 33
KNN on Fisher Iris Data 34
KNN on Fisher Iris Data 35
KNN on Fisher Iris Data Special Case: Majority Vote 36
KNN ON GAUSSIAN DATA 37
KNN on Gaussian Data 38
KNN on Gaussian Data 39
KNN on Gaussian Data 40
KNN on Gaussian Data 41
KNN on Gaussian Data 42
KNN on Gaussian Data 43
KNN on Gaussian Data 44
KNN on Gaussian Data 45
KNN on Gaussian Data 46
KNN on Gaussian Data 47
KNN on Gaussian Data 48
KNN on Gaussian Data 49
KNN on Gaussian Data 50
KNN on Gaussian Data 51
KNN on Gaussian Data 52
KNN on Gaussian Data 53
KNN on Gaussian Data 54
KNN on Gaussian Data 55
KNN on Gaussian Data 56
KNN on Gaussian Data 57
KNN on Gaussian Data 58
KNN on Gaussian Data 59
KNN on Gaussian Data 60
KNN on Gaussian Data 61
KNN on Gaussian Data 62
K-NEAREST NEIGHBORS 63
Questions • How could k-Nearest Neighbors (KNN) be applied to regression? • Can we do better than majority vote? (e.g. distance-weighted KNN) • Where does the Cover & Hart (1967) Bayes error rate bound come from ? 64
KNN Learning Objectives You should be able to… • Describe a dataset as points in a high dimensional space [CIML] • Implement k-Nearest Neighbors with O(N) prediction • Describe the inductive bias of a k-NN classifier and relate it to feature scale [a la. CIML] • Sketch the decision boundary for a learning algorithm (compare k-NN and DT) • State Cover & Hart (1967)'s large sample analysis of a nearest neighbor classifier • Invent "new" k-NN learning algorithms capable of dealing with even k • Explain computational and geometric examples of the curse of dimensionality 65
k-Nearest Neighbors But how do we choose k? 66
MODEL SELECTION 67
Model Selection WARNING : • In some sense, our discussion of model selection is premature. • The models we have considered thus far are fairly simple. • The models and the many decisions available to the data scientist wielding them will grow to be much more complex than what we’ve seen so far. 68
Model Selection Statistics Machine Learning • • Def : a model defines the data Def : (loosely) a model defines the hypothesis space over which generation process (i.e. a set or learning performs its search family of parametric probability distributions) • Def : model parameters are the numeric values or structure • Def : model parameters are the selected by the learning algorithm values that give rise to a that give rise to a hypothesis particular probability distribution in the model family • Def : the learning algorithm defines the data-driven search • Def : learning (aka. estimation) is over the hypothesis space (i.e. the process of finding the search for good parameters) parameters that best fit the data • Def : hyperparameters are the • Def : hyperparameters are the tunable aspects of the model, that parameters of a prior the learning algorithm does not distribution over parameters select 69
Model Selection Example: Decision Tree Machine Learning • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which trees, possibly restricted by learning performs its search some hyperparameters (e.g. max depth) • Def : model parameters are the numeric values or structure selected by the learning algorithm • parameters = structure of a that give rise to a hypothesis specific decision tree • Def : the learning algorithm defines the data-driven search • learning algorithm = ID3, over the hypothesis space (i.e. CART, etc. search for good parameters) • Def : hyperparameters are the • hyperparameters = max- tunable aspects of the model, that depth, threshold for splitting the learning algorithm does not criterion, etc. select 70
Model Selection Machine Learning Example: k-Nearest Neighbors • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which nearest neighbors classifiers learning performs its search • • Def : model parameters are the parameters = none numeric values or structure (KNN is an instance-based or selected by the learning algorithm non-parametric method) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = for naïve defines the data-driven search setting, just storing the data over the hypothesis space (i.e. search for good parameters) • hyperparameters = k , the • Def : hyperparameters are the number of neighbors to tunable aspects of the model, that consider the learning algorithm does not select 71
Model Selection Example: Perceptron Machine Learning • • model = set of all linear Def : (loosely) a model defines the hypothesis space over which separators learning performs its search • • Def : model parameters are the parameters = vector of numeric values or structure weights (one for each selected by the learning algorithm feature) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = mistake defines the data-driven search based updates to the over the hypothesis space (i.e. parameters search for good parameters) • Def : hyperparameters are the • hyperparameters = none tunable aspects of the model, that (unless using some variant the learning algorithm does not such as averaged perceptron) select 72
Recommend
More recommend