data mining support vector machines introduction to data
play

Data Mining Support Vector Machines Introduction to Data Mining, 2 - PDF document

Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Introduction to Data Mining, 2 nd Edition 02/17/2020 1 1 Support Vector Machines Find a linear hyperplane (decision


  1. Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Introduction to Data Mining, 2 nd Edition 02/17/2020 1 1 Support Vector Machines • Find a linear hyperplane (decision boundary) that will separate the data Introduction to Data Mining, 2 nd Edition 02/17/2020 2 2

  2. Support Vector Machines • One Possible Solution Introduction to Data Mining, 2 nd Edition 02/17/2020 3 3 Support Vector Machines • Another possible solution Introduction to Data Mining, 2 nd Edition 02/17/2020 4 4

  3. Support Vector Machines • Other possible solutions Introduction to Data Mining, 2 nd Edition 02/17/2020 5 5 Support Vector Machines • Which one is better? B1 or B2? • How do you define better? Introduction to Data Mining, 2 nd Edition 02/17/2020 6 6

  4. Support Vector Machines • Find hyperplane maximizes the margin => B1 is better than B2 Introduction to Data Mining, 2 nd Edition 02/17/2020 7 7 Support Vector Machines      w x b 0         w x b 1     w x b 1       1 if w x b 1 2    Margin f ( x )          || w || 1 if w x b 1  Introduction to Data Mining, 2 nd Edition 02/17/2020 8 8

  5. Linear SVM • Linear model:       1 if w x b 1   f ( x )         1 if w x b 1  • Learning the model is equivalent to determining  the values of w and b  – How to find from training data? w and b Introduction to Data Mining, 2 nd Edition 02/17/2020 9 9 Learning Linear SVM 2 • Objective is to maximize:  Margin  || w ||    2 || w || – Which is equivalent to minimizing: L ( w ) 2 – Subject to the following constraints:       1 if w x b 1  i y    i      1 if w x b 1  i or 𝑧 � � w • x � � 𝑐� � 1, 𝑗 � 1,2, . . . , 𝑂  This is a constrained optimization problem – Solve it using Lagrange multiplier method Introduction to Data Mining, 2 nd Edition 02/17/2020 10 10

  6. Example of Linear SVM Support vectors  x1 x2 y 0.3858 0.4687 1 65.5261 0.4871 0.611 -1 65.5261 0.9218 0.4103 -1 0 0.7382 0.8936 -1 0 0.1763 0.0579 1 0 0.4057 0.3529 1 0 0.9355 0.8132 -1 0 0.2146 0.0099 1 0 Introduction to Data Mining, 2 nd Edition 02/17/2020 11 11 Learning Linear SVM • Decision boundary depends only on support vectors – If you have data set with same support vectors, decision boundary will not change – How to classify using SVM once w and b are found? Given a test record, x i       1 if w x b 1   i f ( x )    i      1 if w x b 1  i Introduction to Data Mining, 2 nd Edition 02/17/2020 12 12

  7. Support Vector Machines • What if the problem is not linearly separable? Introduction to Data Mining, 2 nd Edition 02/17/2020 13 13 Support Vector Machines • What if the problem is not linearly separable? – Introduce slack variables   Need to minimize: 2   || w || N      k  L ( w ) C i 2    i 1  Subject to:        1 if w x b 1 -  i i y    i        1 if w x b 1  i i  If k is 1 or 2, this leads to similar objective function as linear SVM but with different constraints (see textbook) Introduction to Data Mining, 2 nd Edition 02/17/2020 14 14

  8. Support Vector Machines • Find the hyperplane that optimizes both factors Introduction to Data Mining, 2 nd Edition 02/17/2020 15 15 Nonlinear Support Vector Machines • What if decision boundary is not linear? Introduction to Data Mining, 2 nd Edition 02/17/2020 16 16

  9. Nonlinear Support Vector Machines • Transform data into higher dimensional space Decision boundary:       w ( x ) b 0 Introduction to Data Mining, 2 nd Edition 02/17/2020 17 17 Learning Nonlinear SVM • Optimization problem: • Which leads to the same set of equations (but involve  (x) instead of x) Introduction to Data Mining, 2 nd Edition 02/17/2020 18 18

  10. Learning NonLinear SVM • Issues: – What type of mapping function  should be used? – How to do the computation in high dimensional space?  Most computations involve dot product  (x i )   (x j )  Curse of dimensionality? Introduction to Data Mining, 2 nd Edition 02/17/2020 19 19 Learning Nonlinear SVM • Kernel Trick: –  (x i )   (x j ) = K(x i , x j ) – K(x i , x j ) is a kernel function (expressed in terms of the coordinates in the original space)  Examples: Introduction to Data Mining, 2 nd Edition 02/17/2020 20 20

  11. Example of Nonlinear SVM SVM with polynomial degree 2 kernel Introduction to Data Mining, 2 nd Edition 02/17/2020 21 21 Learning Nonlinear SVM • Advantages of using kernel: – Don’t have to know the mapping function  – Computing dot product  (x i )   (x j ) in the original space avoids curse of dimensionality • Not all functions can be kernels – Must make sure there is a corresponding  in some high-dimensional space – Mercer’s theorem (see textbook) Introduction to Data Mining, 2 nd Edition 02/17/2020 22 22

  12. Characteristics of SVM • The learning problem is formulated as a convex optimization problem – Efficient algorithms are available to find the global minima – Many of the other methods use greedy approaches and find locally optimal solutions – High computational complexity for building the model • Robust to noise • Overfitting is handled by maximizing the margin of the decision boundary, • SVM can handle irrelevant and redundant better than many other techniques • The user needs to provide the type of kernel function and cost function • Difficult to handle missing values • What about categorical variables? Introduction to Data Mining, 2 nd Edition 02/17/2020 23 23

Recommend


More recommend