optimal separating hyperplane basis expansion kernel
play

Optimal separating hyperplane. Basis expansion. Kernel trick. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Po s k P. Po s k c 2015 Artificial


  1. CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 20

  2. Rehearsal P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 20

  3. Linear discrimination function Binary classification of objects x (classification into 2 classes, dichotomy): ■ For 2 classes, 1 discrimination function is enough. Rehearsal ■ Decision rule: • Linear DF � � � y ( i ) = + 1 f ( x ( i ) ) > 0 ⇐ Optimal separating ⇒ � y ( i ) = sign f ( x ( i ) ) hyperplane i.e. � y ( i ) = − 1 f ( x ( i ) ) < 0 ⇐ ⇒ � When a linear decision boundary is not enough. . . Support vector Learning of the linear discrimination function by the perceptron algorithm : machine ■ Optimization of � y ( i ) � | T | y ( i ) � = � ∑ J ( w , T ) = I i = 1 ■ The weight vector is a weighted sum of the training points x ( i ) . ■ Perceptron finds any separating hyperplane, if exists. ■ Among the infinite number of separating hyperplanes, which one is the best? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 20

  4. Optimal separating hyperplane P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 20

  5. Optimal separating hyperplane Support vectors: Margin (cz:odstup) : ■ Data points x lying at the plus 1 level or ■ “The width of the band in which the decision minus 1 level. boundary can move (in the direction of its normal vector) without touching any data ■ Only these points influence the decision point.” boundary! Maximum margin linear classifier Why we would like to maximize the margin? ■ Intuitively, it is safe. xw T + w 0 = 1 ■ If we make a small error in estimating the xw T + w 0 = 0 boundary, the classification will likely stay xw T + w 0 = − 1 correct. ■ The model is invariant with respect to the training set changes, except the changes of support vectors. ■ There are sound theoretical results (based on VC dimension) that having a maximum margin classifier is good. ■ Maximal margin works well in practice. Plus 1 level: { x : xw T + w 0 = 1 } Minus 1 level: { x : xw T + w 0 = − 1 } Decision boundary: { x : xw T + w 0 = 0 } P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 20

  6. Margin size How to compute the margin M given w = ( w 1 , . . . , w D ) , w 0 ? xw T + w 0 = 1 ■ Let’s choose two points x + and x + Rehearsal x − , lying in the plus 1 level and xw T + w 0 = 0 Optimal separating minus 1 level, respectively. M hyperplane xw T + w 0 = − 1 • Optimal SH ■ Let’s compute the margin M as w • Margin size their distance. • OSH learning x − • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector We know that: And we can derive: machine x + w T + w 0 = 1 ( x + − x − ) w T = 2 x − w T + w 0 = − 1 ( x − + λ w − x − ) w T = 2 x − + λ w = x + λ ww T = 2 2 2 λ = ww T = � w � 2 Thus the margin size is 2 2 M = � x + − x − � = � λ w � = λ � w � = � w � 2 � w � = � w � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 20

  7. Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating hyperplane • Optimal SH • Margin size • OSH learning • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

  8. Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

  9. Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision ■ Dual QP task: boundary is not enough. . . | T | | T | | T | α i α j y ( i ) y ( j ) x ( i ) x ( j ) T with respect to α 1 , . . . , α | T | α i − 1 Support vector ∑ ∑ ∑ maximize machine 2 i = 1 i = 1 j = 1 subject to α i ≥ 0 | T | α i y ( i ) = 0. ∑ and i = 1 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

  10. Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision ■ Dual QP task: boundary is not enough. . . | T | | T | | T | α i α j y ( i ) y ( j ) x ( i ) x ( j ) T with respect to α 1 , . . . , α | T | α i − 1 Support vector ∑ ∑ ∑ maximize machine 2 i = 1 i = 1 j = 1 subject to α i ≥ 0 | T | α i y ( i ) = 0. ∑ and i = 1 ■ From the solution of the dual task, we can compute the solution of the primal task: | T | w 0 = y ( k ) − x ( k ) w T , α i y ( i ) x ( i ) , ∑ w = i = 1 where ( x ( k ) , y ( k ) ) is any support vector , i.e. α k > 0. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

  11. Optimal separating hyperplane: concluding remarks The importance of dual formulation : ■ The QP task in dual formulation is easier to solve for QP solvers than the primal formulation. Rehearsal Optimal separating ■ New, unseen examples can be classified using function hyperplane • Optimal SH � | T | � • Margin size f ( x , w , w 0 ) = sign ( xw T + w 0 ) = sign α i y ( i ) x ( i ) x T + w 0 ∑ , • OSH learning • OSH: remarks i = 1 • Demo When a linear decision i.e. the discrimination function contains the examples x only in the form of dot boundary is not products (which will be useful later). enough. . . ■ The examples with α i > 0 are support vectors , thus the sums may be carried out only Support vector machine over the support vectors. ■ The dual formulation allows for other tricks which you will learn later. What if the data are not linearly separable? ■ There is a generalization of the QP task formulation for this case ( soft margin ). ■ The primal task has double the number of constraints, the task is more complex. ■ The results for the QP task with soft margin are of the same type as before. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 20

  12. Optimal separating hyperplane: demo Rehearsal Optimal separating hyperplane 1 • Optimal SH • Margin size • OSH learning 0.8 • OSH: remarks • Demo When a linear decision 0.6 boundary is not enough. . . Support vector 0.4 machine 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 9 / 20

  13. When a linear decision boundary is not enough. . . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 10 / 20

  14. Basis expansion a.k.a. feature space straightening . Rehearsal Optimal separating hyperplane When a linear decision boundary is not enough. . . • Basis expansion • Two spaces • Remarks Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 20

  15. Basis expansion a.k.a. feature space straightening . Rehearsal Why? Optimal separating ■ Linear decision boundary (or linear regression model) may not be flexible enough to hyperplane perform precise classification (regression). When a linear decision boundary is not ■ The algorithms for fitting linear models can be used to fit non-linear models ! enough. . . • Basis expansion • Two spaces • Remarks Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 20

Recommend


More recommend