model selection
play

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal with ties in k-Nearest Neighbors


  1. 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1

  2. Q&A Q: How do we deal with ties in k-Nearest Neighbors (e.g. even k or equidistant points)? A: I would ask you all for a good solution! Q: How do we define a distance function when the features are categorical (e.g. weather takes values {sunny, rainy, overcast})? A: Step 1: Convert from categorical attributes to numeric features (e.g. binary) Step 2: Select an appropriate distance function (e.g. Hamming distance) 2

  3. Reminders • Homework 2: Decision Trees – Out: Wed, Jan 24 – Due: Mon, Feb 5 at 11:59pm • 10601 Notation Crib Sheet 3

  4. K-NEAREST NEIGHBORS 7

  5. k-Nearest Neighbors Chalkboard: – KNN for binary classification – Distance functions – Efficiency of KNN – Inductive bias of KNN – KNN Properties 8

  6. KNN ON FISHER IRIS DATA 9

  7. Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Petal Petal Length Width Length Width 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 10 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set

  8. Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Deleted two of the Length Width four features, so that 0 4.3 3.0 input space is 2D 0 4.9 3.6 0 5.3 3.7 1 4.9 2.4 1 5.7 2.8 1 6.3 3.3 1 6.7 3.0 11 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set

  9. KNN on Fisher Iris Data 12

  10. KNN on Fisher Iris Data Special Case: Nearest Neighbor 13

  11. KNN on Fisher Iris Data Special Case: Majority Vote 14

  12. KNN on Fisher Iris Data 15

  13. KNN on Fisher Iris Data Special Case: Nearest Neighbor 16

  14. KNN on Fisher Iris Data 17

  15. KNN on Fisher Iris Data 18

  16. KNN on Fisher Iris Data 19

  17. KNN on Fisher Iris Data 20

  18. KNN on Fisher Iris Data 21

  19. KNN on Fisher Iris Data 22

  20. KNN on Fisher Iris Data 23

  21. KNN on Fisher Iris Data 24

  22. KNN on Fisher Iris Data 25

  23. KNN on Fisher Iris Data 26

  24. KNN on Fisher Iris Data 27

  25. KNN on Fisher Iris Data 28

  26. KNN on Fisher Iris Data 29

  27. KNN on Fisher Iris Data 30

  28. KNN on Fisher Iris Data 31

  29. KNN on Fisher Iris Data 32

  30. KNN on Fisher Iris Data 33

  31. KNN on Fisher Iris Data 34

  32. KNN on Fisher Iris Data 35

  33. KNN on Fisher Iris Data Special Case: Majority Vote 36

  34. KNN ON GAUSSIAN DATA 37

  35. KNN on Gaussian Data 38

  36. KNN on Gaussian Data 39

  37. KNN on Gaussian Data 40

  38. KNN on Gaussian Data 41

  39. KNN on Gaussian Data 42

  40. KNN on Gaussian Data 43

  41. KNN on Gaussian Data 44

  42. KNN on Gaussian Data 45

  43. KNN on Gaussian Data 46

  44. KNN on Gaussian Data 47

  45. KNN on Gaussian Data 48

  46. KNN on Gaussian Data 49

  47. KNN on Gaussian Data 50

  48. KNN on Gaussian Data 51

  49. KNN on Gaussian Data 52

  50. KNN on Gaussian Data 53

  51. KNN on Gaussian Data 54

  52. KNN on Gaussian Data 55

  53. KNN on Gaussian Data 56

  54. KNN on Gaussian Data 57

  55. KNN on Gaussian Data 58

  56. KNN on Gaussian Data 59

  57. KNN on Gaussian Data 60

  58. KNN on Gaussian Data 61

  59. KNN on Gaussian Data 62

  60. K-NEAREST NEIGHBORS 63

  61. Questions • How could k-Nearest Neighbors (KNN) be applied to regression? • Can we do better than majority vote? (e.g. distance-weighted KNN) • Where does the Cover & Hart (1967) Bayes error rate bound come from ? 64

  62. KNN Learning Objectives You should be able to… • Describe a dataset as points in a high dimensional space [CIML] • Implement k-Nearest Neighbors with O(N) prediction • Describe the inductive bias of a k-NN classifier and relate it to feature scale [a la. CIML] • Sketch the decision boundary for a learning algorithm (compare k-NN and DT) • State Cover & Hart (1967)'s large sample analysis of a nearest neighbor classifier • Invent "new" k-NN learning algorithms capable of dealing with even k • Explain computational and geometric examples of the curse of dimensionality 65

  63. k-Nearest Neighbors But how do we choose k? 66

  64. MODEL SELECTION 67

  65. Model Selection WARNING : • In some sense, our discussion of model selection is premature. • The models we have considered thus far are fairly simple. • The models and the many decisions available to the data scientist wielding them will grow to be much more complex than what we’ve seen so far. 68

  66. Model Selection Statistics Machine Learning • • Def : a model defines the data Def : (loosely) a model defines the hypothesis space over which generation process (i.e. a set or learning performs its search family of parametric probability distributions) • Def : model parameters are the numeric values or structure • Def : model parameters are the selected by the learning algorithm values that give rise to a that give rise to a hypothesis particular probability distribution in the model family • Def : the learning algorithm defines the data-driven search • Def : learning (aka. estimation) is over the hypothesis space (i.e. the process of finding the search for good parameters) parameters that best fit the data • Def : hyperparameters are the • Def : hyperparameters are the tunable aspects of the model, that parameters of a prior the learning algorithm does not distribution over parameters select 69

  67. Model Selection Example: Decision Tree Machine Learning • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which trees, possibly restricted by learning performs its search some hyperparameters (e.g. max depth) • Def : model parameters are the numeric values or structure selected by the learning algorithm • parameters = structure of a that give rise to a hypothesis specific decision tree • Def : the learning algorithm defines the data-driven search • learning algorithm = ID3, over the hypothesis space (i.e. CART, etc. search for good parameters) • Def : hyperparameters are the • hyperparameters = max- tunable aspects of the model, that depth, threshold for splitting the learning algorithm does not criterion, etc. select 70

  68. Model Selection Machine Learning Example: k-Nearest Neighbors • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which nearest neighbors classifiers learning performs its search • • Def : model parameters are the parameters = none numeric values or structure (KNN is an instance-based or selected by the learning algorithm non-parametric method) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = for naïve defines the data-driven search setting, just storing the data over the hypothesis space (i.e. search for good parameters) • hyperparameters = k , the • Def : hyperparameters are the number of neighbors to tunable aspects of the model, that consider the learning algorithm does not select 71

  69. Model Selection Example: Perceptron Machine Learning • • model = set of all linear Def : (loosely) a model defines the hypothesis space over which separators learning performs its search • • Def : model parameters are the parameters = vector of numeric values or structure weights (one for each selected by the learning algorithm feature) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = mistake defines the data-driven search based updates to the over the hypothesis space (i.e. parameters search for good parameters) • Def : hyperparameters are the • hyperparameters = none tunable aspects of the model, that (unless using some variant the learning algorithm does not such as averaged perceptron) select 72

Recommend


More recommend