Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i w t + 1 x i x i w t
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! w t + 1 x i x i w t
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i w t
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 .
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 .
Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 . Claim 2 holds even if no separating hyperplane!
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ .
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm.
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w ≤ || w t ||
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M .
Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M . → M ≤ 1 γ 2
Hinge Loss. Most of data has good separator.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting?
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ + Claim 2. →
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh...
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ .
Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ . Hinge loss: 1 γ TD γ .
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane.
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it!
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.)
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake.
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 ,
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n ,
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i ,
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 .
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same
Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same (ish) as before.
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i w t + 1 w t < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v w t + 1 w t < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 w t < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.)
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2.
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2
Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2 If | w t | ≥ 2 γ , then | w t + 1 | ≤ | w t | + 3 4 γ .
Recommend
More recommend