today
play

Today Perceptron. Today Perceptron. Support Vector Machine. - PowerPoint PPT Presentation

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n . + + ++ Labelled points with x 1 ,..., x n . Hyperplane separator. + + ++ Labelled points


  1. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i w t + 1 x i x i w t

  2. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! w t + 1 x i x i w t

  3. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i w t

  4. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t

  5. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 .

  6. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 .

  7. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 . Claim 2 holds even if no separating hyperplane!

  8. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ .

  9. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1

  10. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm.

  11. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M

  12. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w

  13. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w ≤ || w t ||

  14. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M .

  15. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M . → M ≤ 1 γ 2

  16. Hinge Loss. Most of data has good separator.

  17. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ .

  18. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress

  19. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way.

  20. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting?

  21. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin.

  22. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ .

  23. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part.

  24. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t .

  25. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ

  26. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ + Claim 2. →

  27. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M

  28. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M

  29. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.

  30. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.

  31. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh...

  32. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .

  33. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .

  34. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ .

  35. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ . Hinge loss: 1 γ TD γ .

  36. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane.

  37. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it!

  38. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.)

  39. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake.

  40. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 ,

  41. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n ,

  42. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i ,

  43. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1

  44. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 .

  45. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same

  46. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same (ish) as before.

  47. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??

  48. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??

  49. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2

  50. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2

  51. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i w t + 1 w t < γ / 2

  52. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v w t + 1 w t < γ / 2

  53. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 w t < γ / 2

  54. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2

  55. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.)

  56. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2.

  57. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2

  58. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2 If | w t | ≥ 2 γ , then | w t + 1 | ≤ | w t | + 3 4 γ .

Recommend


More recommend