Linear models for classification. Perceptron. Logistic regression. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear models for classification. Perceptron. Logistic regression. Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 12

Linear classification P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 12

Binary classification task (dichotomy) Let’s have the training dataset T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) : ■ each example is described by a vector of features x = ( x 1 , . . . , x D ) , Linear classification ■ each example is labeled with the correct class y ∈ { + 1, − 1 } . • Binary class. • Naive approach Discrimination function: a function allowing us to decide to which class an example x Perceptron belongs. Logistic regression ■ For 2 classes, 1 discrimination function is enough. ■ Decision rule: � � � y ( i ) = + 1 f ( x ( i ) ) > 0 ⇐ ⇒ � y ( i ) = sign f ( x ( i ) ) i.e. � y ( i ) = − 1 f ( x ( i ) ) < 0 ⇐ ⇒ � ■ Learning then amounts to finding (parameters of) function f . 1.5 4 3 1 2 1 0.5 0 f(x) −1 f(x) 0 −2 −3 −0.5 −4 −5 −1 −6 0.5 1 1.5 2 2.5 3 3.5 0 1 2 3 4 5 x x P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 12

Naive approach Problem: Learn a linear discrimination function f from data T . Linear classification • Binary class. • Naive approach Perceptron Logistic regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 12

Naive approach Problem: Learn a linear discrimination function f from data T . Naive solution: fit linear regression model to the data! Linear classification • Binary class. ■ Use cost function • Naive approach � � 2 | T | Perceptron 1 y ( i ) − f ( w , x ( i ) ) ∑ J MSE ( w , T ) = , Logistic regression | T | i = 1 ■ minimize it with respect to w , ■ and use � y = sign ( f ( x )) . ■ Issue: Points far away from the decision boundary have huge effect on the model! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 12

Naive approach Problem: Learn a linear discrimination function f from data T . Naive solution: fit linear regression model to the data! Linear classification • Binary class. ■ Use cost function • Naive approach � � 2 | T | Perceptron 1 y ( i ) − f ( w , x ( i ) ) ∑ J MSE ( w , T ) = , Logistic regression | T | i = 1 ■ minimize it with respect to w , ■ and use � y = sign ( f ( x )) . ■ Issue: Points far away from the decision boundary have huge effect on the model! Better solution: fit a linear discrimination function which minimizes the number of errors! ■ Cost function: | T | 1 I ( y ( i ) � = � y ( i ) ) , ∑ J 01 ( w , T ) = | T | i = 1 where I is the indicator function: I ( a ) returns 1 iff a is True, 0 otherwise. ■ The cost function is non-smooth, contains plateaus, not easy to optimize, but there are algorithms which attempt to solve it, e.g. perceptron, Kozinec’s algorithm, etc. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 12

Perceptron P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 12

Perceptron algorithm Perceptron [Ros62]: ■ a simple model of a neuron Linear classification ■ linear classifier (in this case a classifier with linear discrimination function) Perceptron • Algorithm • Demo Algorithm 1: Perceptron algorith • Features • Result Input : Linearly separable training dataset: { x ( i ) , y ( i ) } , x ( i ) ∈ R D + 1 (homogeneous coordinates), y ( i ) ∈ { + 1, − 1 } Logistic regression Output : Weight vector w such that x ( i ) w T > 0 iff y ( i ) = + 1 and x ( i ) w T < 0 iff y ( i ) = − 1 1 begin Initialize the weight vector, e.g. w = 0 . 2 Invert all examples x belonging to class -1: x ( i ) = − x ( i ) for all i , where y ( i ) = − 1. 3 Find an incorrectly classified training vector, i.e. find j such that x ( i ) w T ≤ 0, e.g. the worst 4 classified vector: x ( j ) = argmin x ( i ) ( x ( i ) w T ) . if all examples classified correctly then 5 Return the solution w . Terminate. 6 else 7 Update the weight vector: w = w + x ( j ) . 8 Go to 4. 9 [Ros62] Frank Rosenblatt. Principles of Neurodynamics: Perceptron and the Theory of Brain Mechanisms . Spartan Books, Washington, D.C., 1962. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 12

Demo: Perceptron Iteration 257 Linear classification Perceptron • Algorithm • Demo 1 • Features • Result 0.8 Logistic regression 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 12

Features of the perceptron algorithm Perceptron convergence theorem [Nov62]: ■ Perceptron algorithm eventually finds a hyperplane that separates 2 classes of points, if such a hyperplane exists. Linear classification Perceptron ■ If no separating hyperplane exists, the alorithm does not have to converge and will • Algorithm iterate forever. • Demo • Features Possible solutions: • Result ■ Pocket algorithm - track the error the perceptron makes in each iteration and store the Logistic regression best weights found so far in a separate memory (pocket). ■ Use a different learning algorithm, which finds an approximate solution, if the classes are not linearly separable. [Nov62] Albert B. J. Novikoff. On convergence proofs for perceptrons. In Proceedings of the Symposium on Mathematical Theory of Automata , volume 12, Brooklyn, New York, 1962. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 8 / 12

The hyperplane found by perceptron The perceptron algorithm ■ finds a separating hyperplane, if it exists; Linear classification ■ but if a single separating hyperplane exists, then there are infinitely many (equally Perceptron good) separating hyperplanes • Algorithm • Demo • Features • Result Logistic regression ■ and perceptron finds any of them! Which separating hyperplane is the optimal one? What does “optimal” actually mean? (Possible answers in the SVM lecture.) P. Poˇ s´ ık c � 2015 Artificial Intelligence – 9 / 12

Logistic regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 10 / 12

Logistic regression model Problem: Learn a binary classifier for the dataset T = { ( x ( i ) , y ( i ) ) } , where y ( i ) ∈ { 0, 1 } . 1 To reiterate: when using linear regression, the examples far from the decision boundary Linear classification have a huge impact on h . How to limit their influence? Perceptron Logistic regression • Model • Cost function P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 12

Logistic regression model Problem: Learn a binary classifier for the dataset T = { ( x ( i ) , y ( i ) ) } , where y ( i ) ∈ { 0, 1 } . 1 To reiterate: when using linear regression, the examples far from the decision boundary Linear classification have a huge impact on h . How to limit their influence? Perceptron Logistic regression Logistic regression uses a transformation of the values of linear function • Model • Cost function 1 h w ( x ) = g ( xw T ) = 1 + e − xw T , where 1 g ( z ) = 1 + e − z is the sigmoid function (a.k.a logistic function). P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 12

Logistic regression model Problem: Learn a binary classifier for the dataset T = { ( x ( i ) , y ( i ) ) } , where y ( i ) ∈ { 0, 1 } . 1 To reiterate: when using linear regression, the examples far from the decision boundary Linear classification have a huge impact on h . How to limit their influence? Perceptron Logistic regression Logistic regression uses a transformation of the values of linear function • Model • Cost function 1 h w ( x ) = g ( xw T ) = 1 + e − xw T , where 1 g ( z ) = 1 + e − z is the sigmoid function (a.k.a logistic function). Interpretation of the model: h w ( x ) estimates the probability that x belongs to class 1. ■ ■ Logistic regression is a classification model! ■ The discrimination function h w ( x ) itself is not linear anymore; but the decision boundary is still linear! 1 Previously, we have used y ( i ) ∈ {− 1, + 1 } , but the values can be chosen arbitrarily, and { 0, 1 } is convenient for logistic regression. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 11 / 12

Cost function To train the logistic regression model, one can use the J MSE criterion: � � 2 | T | 1 y ( i ) − h w ( x ( i ) ) Linear classification ∑ J ( w , T ) = . | T | Perceptron i = 1 Logistic regression However, this results in a non-convex multimodal landscape which is hard to optimize. • Model • Cost function P. Poˇ s´ ık c � 2015 Artificial Intelligence – 12 / 12

Cost function To train the logistic regression model, one can use the J MSE criterion: � � 2 | T | 1 y ( i ) − h w ( x ( i ) ) Linear classification ∑ J ( w , T ) = . | T | Perceptron i = 1 Logistic regression However, this results in a non-convex multimodal landscape which is hard to optimize. • Model • Cost function Logistic regression uses a modified cost function | T | 1 cost ( y ( i ) , h w ( x ( i ) )) , where ∑ J ( w , T ) = | T | i = 1 � − log ( � y ) if y = 1 cost ( y , � y ) = , − log ( 1 − � y ) if y = 0 which can be rewritten in a single expression as cost ( y , � y ) = − y log ( � y ) − ( 1 − y ) log ( 1 − � y ) . Such a cost function is simpler to optimize. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 12 / 12

Linear models for classification. Perceptron. Logistic regression. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear models for classification. Perceptron. Logistic regression. Petr Po s k P. Po s k c 2015 Artificial Intelligence 1

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic mixed models for DIF IRT models can be regarded as logistic mixed models (e.g., Adams,

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

Integer Linear Programming CONTACT@ADAMFURMANEK.PL HTTP://BLOG.ADAMFURMANEK.PL FURMANEKADAM 1

Applied Machine Learning Linear Regression Siamak Ravanbakhsh COMP 551 (fall 2020) Learning

Supervised Learning Liyao Xiang http://xiangliyao.cn/ Shanghai Jiao Tong University Reference and

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

Linear Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Constant-factor approximation algorithms for the minmax regret problem Juan Pablo Fern andez

CS 559: Machine Learning CS 559: Machine Learning Fundamentals and Applications 12 th Set of

Linear models for classification. Perceptron. Logistic regression. - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear models for classification. Perceptron. Logistic regression. Petr Po s k P. Po s k c 2015 Artificial Intelligence 1

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Linear Models Continued: Perceptron &amp; Logistic Regression CMSC 723 / LING 723 / INST 725

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic mixed models for DIF IRT models can be regarded as logistic mixed models (e.g., Adams,

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Logistic Regression Lecture 6 Logistic Regression Classification Model CS 335

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

Integer Linear Programming CONTACT@ADAMFURMANEK.PL HTTP://BLOG.ADAMFURMANEK.PL FURMANEKADAM 1

Applied Machine Learning Linear Regression Siamak Ravanbakhsh COMP 551 (fall 2020) Learning

Supervised Learning Liyao Xiang http://xiangliyao.cn/ Shanghai Jiao Tong University Reference and

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

Linear Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Constant-factor approximation algorithms for the minmax regret problem Juan Pablo Fern andez

CS 559: Machine Learning CS 559: Machine Learning Fundamentals and Applications 12 th Set of

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725