the perceptron algorithm
play

The Perceptron Algorithm Machine Learning 1 Some slides based on - PowerPoint PPT Presentation

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Outline The Perceptron Algorithm Variants of Perceptron Perceptron Mistake Bound 2 Where are we? The Perceptron


  1. The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others

  2. Outline • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 2

  3. Where are we? • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 3

  4. Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 4

  5. Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) sgn 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 ∑ 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑥 ! 𝑥 " 𝑥 # 𝑥 $ 𝑥 % 𝑥 & 𝑥 ' 𝑥 ( 𝑐 𝑦 ! 𝑦 " 𝑦 # 𝑦 $ 𝑦 % 𝑦 & 𝑦 ' 𝑦 ( 1 𝑐 is called the bias term 5

  6. The geometry of a linear classifier sgn(b +w 1 x 1 + w 2 x 2 ) We only care about the sign, not the magnitude b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - In higher dimensions, - - - a linear classifier -- - - represents a hyperplane - that separates the space - into two half-spaces x 2 6

  7. The Perceptron 7

  8. The Perceptron algorithm • Rosenblatt 1958 – (Though there were some hints of a similar idea earlier, eg: Agmon 1954) • The goal is to find a separating hyperplane – For separable data, guaranteed to find one • An online algorithm – Processes one example at a time • Several variants exist – We will see these briefly at towards the end 8

  9. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 9

  10. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 10

  11. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector Footnote: For some algorithms it is mathematically easier to represent False as -1, and at other times, as 0. For the Perceptron algorithm, treat -1 as false and +1 as true. 11

  12. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 12

  13. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 13

  14. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 14

  15. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector 15

  16. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector This is the simplest version. We will see more robust versions shortly 16

  17. Intuition behind the update Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Suppose we have made a mistake on a positive example " 𝐲 ≤ 0 That is, 𝑧 = +1 and 𝐱 ! Call the new weight vector 𝐱 !#$ = 𝐱 ! + 𝐲 (say r = 1) " 𝐲 = 𝐱 ! + 𝐲 " 𝐲 = 𝐱 ! " 𝐲 + 𝐲 𝐔 𝐲 ≥ 𝐱 𝐮 𝐔 𝐲 The new dot product is 𝐱 %#$ For a positive example, the Perceptron update will increase the score assigned to the same input Similar reasoning for negative examples 17

  18. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old 18

  19. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) 19

  20. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) For a mistake on a positive example 20

  21. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old (x, +1) (x, +1) For a mistake on a positive example 21

  22. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 22

  23. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 23

  24. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update After 𝐱 ← 𝐱 + 𝑧𝐲 w old y x w new (x, +1) (x, +1) (x, +1) For a mistake on a positive example 24

  25. Geometry of the perceptron update Predict w old 25

  26. Geometry of the perceptron update Predict (x, -1) w old For a mistake on a negative example 26

Recommend


More recommend