support vector machines
play

Support Vector Machines 3-18-16 Reading Quiz Q1: Which of these - PowerPoint PPT Presentation

Support Vector Machines 3-18-16 Reading Quiz Q1: Which of these hyperplanes would be selected by a support vector machine? a) a b) b c) c d) None of these a c b Reading Quiz Q2: Which of these points is a support vector to the


  1. Support Vector Machines 3-18-16

  2. Reading Quiz Q1: Which of these hyperplanes would be selected by a support vector machine? a) a b) b c) c d) None of these a c b

  3. Reading Quiz Q2: Which of these points is a support vector to the hyperplane? a) a b) b c c) C d) None of these a b

  4. Generalizability The goal in machine learning is to find a model that generalizes to unseen data. We can never be certain about how well our model will generalize, but we have some helpful principles available to us. Core principle: a model that is robust to small perturbations generalizes better.

  5. Robustness and overfitting How can we tell that the red model is an overfit? Cross validation; training/test sets. Perturb an input slightly; how much does the model change?

  6. Support vector machines: the setting ● Supervised learning (we know the correct output for each test point). ● 2-class classification (there are exactly two class labels). ● Inputs are treated as continuous.

  7. Perceptron decision boundaries A perceptron learns an arbitrary separating hyperplane, and could therefore return any of these decision boundaries. Why is b best? a c b

  8. Max-margin classifier We can find the hyperplane with the most room for error by finding the parallel separating hyperplanes with the largest distance between them and taking their midpoint. This is formulated as a quadratic programming problem. Maximize: the margin between the parallel hyperplanes. Subject to: constraints ensuring that we find separating hyperplanes.

  9. Soft margin classifier What if the data aren’t linearly separable, but come close? Key idea: loosen the constraints to allow misclassifications, but add a penalty term to the objective. Instead of maximizing the margin, minimize:

  10. Changing the basis Changing bases means picking a different set of coordinates in which to represent the data. Here is the simplest sort of basis change:

  11. A more interesting change of basis 2 , x 2 2 ) (x 1 , x 2 ) → (x 1

  12. Criteria for choosing a basis ● We want the data to be (almost) linearly separable in the basis we choose. ● We want the basis to have relatively low dimension. ● Kernel trick ○ We can get away with a high-dimension basis if we don’t have to explicitly convert the data into the new basis. ○ Sometimes it’s easy to compute dot products in the new basis without actually transforming the inputs. ○ The distances in the SVM formula can be expressed with dot products.

  13. Changing basis in other machine learning algorithms ● Kernel trick can apply to perceptrons, k nearest neighbors, etc. ● We can do polynomial regression by changing basis and then doing linear regression in the polynomial basis. ● Gaussian process regression.

Recommend


More recommend