Perceptron Algorithm An aside: a hyperplane is a perceptron. - PowerPoint PPT Presentation

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you see? Linear programming!) Labelled points with x 1 ,..., x n . γ Alg: Given x 1 ,..., x n . Hyperplane separator. − + − Let w 1 = x 1 . Margins. − θ For each x i where w t · x i has wrong sign (negative) Perceptron − + ++ Inside unit ball. w t + 1 = w t + x i − Margin γ Support Vector Machines t = t + 1 Hyperplane: Lagrange Multiplier Theorem: Algorithm only makes 1 γ 2 mistakes. − w · x ≥ γ for + points. − w · x ≤ − γ for − points. Idea: Mistake on positive x i : − Put points on unit ball. w t + 1 · x i = ( w t + x i ) · x i = w t x i + 1. − w · x = cos θ A step in the right direction! − Will assume positive labels! Claim 1: w t + 1 · w ≥ w t · w + γ . negate the negative. A γ in the right direction! Mistake on positive x i ; w t + 1 · w = ( w t + x i ) · w = w t · w + x i · w ≥ w t · w + γ . Putting it together... Hinge Loss. Most of data has good separator. Alg: Given x 1 ,..., x n . Claim 1: w t + 1 · w ≥ w t · w + γ . Let w 1 = x 1 . For each x i where w t · x i has wrong sign (negative) Don’t make progress or tilt the wrong way. w t + 1 = w t + x i Claim 1: w t + 1 · w ≥ w t · w + γ . = ⇒ w t · w ≥ t γ How much bad tilting? t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1. = ⇒ | w t | 2 ≤ t Rotate points to have γ -margin. Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 Total rotation: TD γ . M -number of mistakes in algorithm. w t + 1 = w t + x i Analysis: subtract bad tilting part. Let t = M . Less than a right angle! Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . γ M ≤ w M · w √ w t + 1 x i √ x i Algebraically. ≤ || w M || ≤ M . w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Positive x i , w t · x i ≤ 0. → M ≤ 1 Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . γ ≤ 0. γ 2 ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 . Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . Claim 2 holds even if no separating hyperplane! The extra is (twice) the amount of rotation in units of 1 / γ . Hinge loss: 1 γ TD γ .

Approximately Maximizing Margin Algorithm Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. There is a γ separating hyperplane. Find it! (Kind of.) Obtuse triangle. v x i | v | 2 ≤ | w t | 2 + 1 Any point within γ / 2 is still a mistake. w t + 1 Support Vector Machines. 1 → | v | ≤ | w t | + w t Let w 1 = x 1 , 2 | w t | < γ / 2 (square right hand side.) For each x 2 ,... x n , Red bit is at most γ / 2. if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2 Claim 1: w t + 1 · w ≥ w t · w + γ . If | w t | ≥ 2 γ , then | w t + 1 | ≤ | w t | + 3 4 γ . Same (ish) as before. M updates | w M | ≤ 2 γ + 3 4 γ M . Claim 1: Implies | w M | ≥ γ M . γ M ≤ 2 γ + 3 4 γ M → M ≤ 8 γ 2 Other fat separators? Kernel Functions. Video Map x to φ ( x ) . x 2 + y 2 Hyperplane separator for points under φ ( · ) . + + + Problem: complexity of computing in higher dimension. + + + Recall perceptron. Only compute dot products! − + Test: w t · x i > γ − w t = x i 1 + x i 2 + x i 3 ··· − − − “http://www.youtube.com/watch?v=3liCbRZPrZA” − Support Vectors: x i 1 , x i 2 ,... − − → Support Vector Machine. − − + Kernel trick: compute dot products in original space. y x Kernel function for mapping φ ( · ) : K ( x , y ) = φ ( x ) · φ ( y ) No hyperplane separator. K ( x , y ) = ( 1 + x · y ) d φ ( x ) = [ 1 ,..., x i ,..., x i x j ... ] . Polynomial. Circle separator! Map points to three dimensions. K ( x , y ) = ( 1 + x 1 y 1 )( 1 + x 2 y 2 ) ··· ( 1 + x n y n ) map point ( x , y ) to point ( x , y , x 2 + y 2 ) . φ ( x ) - products of all subsets. Boolean Fourier basis. Hyperplane separator in three dimensions. K ( x , y ) = exp ( C | x − y | 2 ) Infinite dimensional space. Expansion of e z . Gaussian Kernel.

Support Vector Machine Lagrangian Dual. Find x , subject to Pick Kernel. f i ( x ) ≤ 0 , i = 1 ,... m . Run algorithm that: (1) Uses dot products. Remember calculus (constrained optimization.) (2) Outputs hyperplane that is linear combination of points. L ( x , λ ) = ∑ m Lagrangian: i = 1 λ i f i ( x ) Perceptron. λ i ≥ 0 - Lagrangian multiplier for inequality i . Lagrange Multipliers. Max Margin Problem as Convex optimization: min | w | 2 where ∀ i w · x i ≥ 1. For feasible solution x , L ( x , λ ) is (A) non-negative in expectation (B) positive for any λ . Algorithms output: tight hyperplanes! (C) non-positive for any valid λ . X Solution is linear combination of hyperplanes w = α 1 x 1 + α 2 x 2 + ··· . If ∃ λ ≥ 0, where L ( x , λ ) is positive for all x With Kernel: φ ( · ) Problem is to find α i where (A) there is no feasible x . ∀ i ( ∑ j α j φ ( x j )) · φ ( x i ) ≥ 1 (B) there is no x , λ with L ( x , λ ) < 0. Lagrangian:constrained optimization. Why important: KKT. Linear Program. Karash, Kuhn and Tucker Conditions. min cx , Ax ≥ b min f ( x ) subject to f i ( x ) ≤ 0 , i = 1 ,..., m min f ( x ) min c · x Lagrangian function: subject to b i − a i · x ≤ 0 , i = 1 ,..., m subject to f i ( x ) ≤ 0 , i = 1 ,..., m L ( x , λ ) = f ( x )+ ∑ m i = 1 λ i f i ( x ) L ( x , λ ) = f ( x )+ ∑ m i = 1 λ i f i ( x ) Lagrangian (Dual): If (primal) x has value v f ( x ) = v and all f i ( x ) ≤ 0 For all λ ≥ 0 have L ( x , λ ) ≤ v Local minima for feasible x ∗ . L ( λ , x ) = cx + ∑ i λ i ( b i − a i x ) . Maximizing λ , only positive λ i when f i ( x ) = 0 There exist multipliers λ , where or which implies L ( x , λ ) ≥ f ( x ) = v ∇ f ( x ∗ )+ ∑ i λ i ∇ f i ( x ∗ ) = 0 L ( λ , x ) = − ( ∑ j x j ( a j λ − c j ))+ b λ . If there is λ with L ( x , λ ) ≥ α for all x Feasible primal, f i ( x ∗ ) ≤ 0, and feasible dual λ i ≥ 0. Optimum value of program is at least α Best λ ? max b · λ where a j λ = c j . Complementary slackness: λ i f i ( x ∗ ) = 0. Primal problem: max b λ , λ T A = c , λ ≥ 0 x , that minimizes L ( x , λ ) over all λ ≥ 0. Launched nonlinear programming! See paper. Duals! Dual problem: λ , that maximizes L ( x , λ ) over all x .

Perceptron Algorithm An aside: a hyperplane is a perceptron. - PowerPoint PPT Presentation

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you see? Linear programming!) Labelled points with x 1 ,..., x n . Alg: Given x 1 ,..., x n . Hyperplane separator. + Let w 1 = x 1 .

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

Lecture 7: Arora Rao Vazirani Lecture Outline Part I: Semidefinite Programming Relaxation for

Big Data Analytics Symposium - Summer 2019 Analytics Project: Whats the trend of hot areas in

Vision San Jacinto College will be the leader in educational excellence and in the achievement of

Hardware, Modularity, and Virtualization CS 111 Operating System Principles Peter Reiher

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Triangle Rasterization Sung-Eui Yoon ( ) ( ) C Course URL: URL

Counteracting Adversarial Attacks in Autonomous Driving Qi Sun 1 , Arjun Ashok Rao 1 , Xufeng Yao

Cloth Simulation CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018 Cloth

Perceptron Algorithm An aside: a hyperplane is a perceptron. - PowerPoint PPT Presentation

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you see? Linear programming!) Labelled points with x 1 ,..., x n . Alg: Given x 1 ,..., x n . Hyperplane separator. + Let w 1 = x 1 .

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net &gt; 0, else 0) l

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

Lecture 7: Arora Rao Vazirani Lecture Outline Part I: Semidefinite Programming Relaxation for

Big Data Analytics Symposium - Summer 2019 Analytics Project: Whats the trend of hot areas in

Vision San Jacinto College will be the leader in educational excellence and in the achievement of

Hardware, Modularity, and Virtualization CS 111 Operating System Principles Peter Reiher

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Triangle Rasterization Sung-Eui Yoon ( ) ( ) C Course URL: URL

Counteracting Adversarial Attacks in Autonomous Driving Qi Sun 1 , Arjun Ashok Rao 1 , Xufeng Yao

Cloth Simulation CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018 Cloth

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l