Optimal separating hyperplane. Basis expansion. Kernel trick. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 20

Rehearsal P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 20

Linear discrimination function Binary classification of objects x (classification into 2 classes, dichotomy): ■ For 2 classes, 1 discrimination function is enough. Rehearsal ■ Decision rule: • Linear DF � � � y ( i ) = + 1 f ( x ( i ) ) > 0 ⇐ Optimal separating ⇒ � y ( i ) = sign f ( x ( i ) ) hyperplane i.e. � y ( i ) = − 1 f ( x ( i ) ) < 0 ⇐ ⇒ � When a linear decision boundary is not enough. . . Support vector Learning of the linear discrimination function by the perceptron algorithm : machine ■ Optimization of � y ( i ) � | T | y ( i ) � = � ∑ J ( w , T ) = I i = 1 ■ The weight vector is a weighted sum of the training points x ( i ) . ■ Perceptron finds any separating hyperplane, if exists. ■ Among the infinite number of separating hyperplanes, which one is the best? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 20

Optimal separating hyperplane P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 20

Optimal separating hyperplane Support vectors: Margin (cz:odstup) : ■ Data points x lying at the plus 1 level or ■ “The width of the band in which the decision minus 1 level. boundary can move (in the direction of its normal vector) without touching any data ■ Only these points influence the decision point.” boundary! Maximum margin linear classifier Why we would like to maximize the margin? ■ Intuitively, it is safe. xw T + w 0 = 1 ■ If we make a small error in estimating the xw T + w 0 = 0 boundary, the classification will likely stay xw T + w 0 = − 1 correct. ■ The model is invariant with respect to the training set changes, except the changes of support vectors. ■ There are sound theoretical results (based on VC dimension) that having a maximum margin classifier is good. ■ Maximal margin works well in practice. Plus 1 level: { x : xw T + w 0 = 1 } Minus 1 level: { x : xw T + w 0 = − 1 } Decision boundary: { x : xw T + w 0 = 0 } P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 20

Margin size How to compute the margin M given w = ( w 1 , . . . , w D ) , w 0 ? xw T + w 0 = 1 ■ Let’s choose two points x + and x + Rehearsal x − , lying in the plus 1 level and xw T + w 0 = 0 Optimal separating minus 1 level, respectively. M hyperplane xw T + w 0 = − 1 • Optimal SH ■ Let’s compute the margin M as w • Margin size their distance. • OSH learning x − • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector We know that: And we can derive: machine x + w T + w 0 = 1 ( x + − x − ) w T = 2 x − w T + w 0 = − 1 ( x − + λ w − x − ) w T = 2 x − + λ w = x + λ ww T = 2 2 2 λ = ww T = � w � 2 Thus the margin size is 2 2 M = � x + − x − � = � λ w � = λ � w � = � w � 2 � w � = � w � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 20

Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating hyperplane • Optimal SH • Margin size • OSH learning • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision boundary is not enough. . . Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision ■ Dual QP task: boundary is not enough. . . | T | | T | | T | α i α j y ( i ) y ( j ) x ( i ) x ( j ) T with respect to α 1 , . . . , α | T | α i − 1 Support vector ∑ ∑ ∑ maximize machine 2 i = 1 i = 1 j = 1 subject to α i ≥ 0 | T | α i y ( i ) = 0. ∑ and i = 1 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

Optimal separating hyperplane learning 2 We want to maximize margin M = � w � subject to the constraints ensuring correct classification of the training set T . This optimization problem can be formulated as a quadratic programming (QP) task. Rehearsal Optimal separating ■ Primary QP task: hyperplane • Optimal SH minimize ww T with respect to w 1 , . . . , w D • Margin size • OSH learning subject to y ( i ) ( x ( i ) w T + w 0 ) ≥ 1. • OSH: remarks • Demo When a linear decision ■ Dual QP task: boundary is not enough. . . | T | | T | | T | α i α j y ( i ) y ( j ) x ( i ) x ( j ) T with respect to α 1 , . . . , α | T | α i − 1 Support vector ∑ ∑ ∑ maximize machine 2 i = 1 i = 1 j = 1 subject to α i ≥ 0 | T | α i y ( i ) = 0. ∑ and i = 1 ■ From the solution of the dual task, we can compute the solution of the primal task: | T | w 0 = y ( k ) − x ( k ) w T , α i y ( i ) x ( i ) , ∑ w = i = 1 where ( x ( k ) , y ( k ) ) is any support vector , i.e. α k > 0. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 20

Optimal separating hyperplane: concluding remarks The importance of dual formulation : ■ The QP task in dual formulation is easier to solve for QP solvers than the primal formulation. Rehearsal Optimal separating ■ New, unseen examples can be classified using function hyperplane • Optimal SH � | T | � • Margin size f ( x , w , w 0 ) = sign ( xw T + w 0 ) = sign α i y ( i ) x ( i ) x T + w 0 ∑ , • OSH learning • OSH: remarks i = 1 • Demo When a linear decision i.e. the discrimination function contains the examples x only in the form of dot boundary is not products (which will be useful later). enough. . . ■ The examples with α i > 0 are support vectors , thus the sums may be carried out only Support vector machine over the support vectors. ■ The dual formulation allows for other tricks which you will learn later. What if the data are not linearly separable? ■ There is a generalization of the QP task formulation for this case ( soft margin ). ■ The primal task has double the number of constraints, the task is more complex. ■ The results for the QP task with soft margin are of the same type as before. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 20

Optimal separating hyperplane: demo Rehearsal Optimal separating hyperplane 1 • Optimal SH • Margin size • OSH learning 0.8 • OSH: remarks • Demo When a linear decision 0.6 boundary is not enough. . . Support vector 0.4 machine 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 9 / 20

When a linear decision boundary is not enough. . . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 10 / 20

Basis expansion a.k.a. feature space straightening . Rehearsal Optimal separating hyperplane When a linear decision boundary is not enough. . . • Basis expansion • Two spaces • Remarks Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 20

Basis expansion a.k.a. feature space straightening . Rehearsal Why? Optimal separating ■ Linear decision boundary (or linear regression model) may not be flexible enough to hyperplane perform precise classification (regression). When a linear decision boundary is not ■ The algorithms for fitting linear models can be used to fit non-linear models ! enough. . . • Basis expansion • Two spaces • Remarks Support vector machine P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 20

Optimal separating hyperplane. Basis expansion. Kernel trick. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Po s k P. Po s k c 2015 Artificial

7. Separating Hyperplane Theorems I Daisuke Oyama Mathematics II May 1, 2020 Separating

SEPARATING UNITS SEPARATING UNITS Application Separating workpieces and media, e.g. grinding

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

The Two Hyperplane Conjecture David Jerison (MIT) In honor of Steve Hofmann, ICMAT, May 2018

Hyperplane arrangements, graphic monoids and moment categories Clemens Berger University of

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

7. Separating Hyperplane Theorems II Daisuke Oyama Mathematics II May 7, 2020 Farkas Lemma

Separating hyperplanes S a closed, convex set Point x not in S ==> strict

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

The Early Time Expansion of the Heat Kernel Stefan Lippoldt November 8, 2016 1 / 15 Motivation

Expansion Study F Expansion Study For Oswego East High School Expansion Study F Expansion Study

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Calculating Hypergradient Jingchang Liu November 13, 2019 HKUST 1 Table of Contents

Hyperparameter optimization strategies git clone

Natural Language Processing and Information Retrieval Support Vector Machines Alessandro

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18 Outline

Support Vector Machines (I): Overview and Linear SVM LING 572 Advanced Statistical Techniques

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Optimal separating hyperplane. Basis expansion. Kernel trick. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Po s k P. Po s k c 2015 Artificial

7. Separating Hyperplane Theorems I Daisuke Oyama Mathematics II May 1, 2020 Separating

SEPARATING UNITS SEPARATING UNITS Application Separating workpieces and media, e.g. grinding

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

The Two Hyperplane Conjecture David Jerison (MIT) In honor of Steve Hofmann, ICMAT, May 2018

Hyperplane arrangements, graphic monoids and moment categories Clemens Berger University of

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

7. Separating Hyperplane Theorems II Daisuke Oyama Mathematics II May 7, 2020 Farkas Lemma

Separating hyperplanes S a closed, convex set Point x not in S ==&gt; strict

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

The Early Time Expansion of the Heat Kernel Stefan Lippoldt November 8, 2016 1 / 15 Motivation

Expansion Study F Expansion Study For Oswego East High School Expansion Study F Expansion Study

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Calculating Hypergradient Jingchang Liu November 13, 2019 HKUST 1 Table of Contents

Hyperparameter optimization strategies git clone

Natural Language Processing and Information Retrieval Support Vector Machines Alessandro

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18 Outline

Support Vector Machines (I): Overview and Linear SVM LING 572 Advanced Statistical Techniques

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Separating hyperplanes S a closed, convex set Point x not in S ==> strict