Machine Learning Basics Lecture 5: SVM II
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu - - PowerPoint PPT Presentation
Machine Learning Basics Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective SVM: objective , = + . Margin: Let +1, 1 , ,
Princeton University COS 495 Instructor: Yingyu Liang
π₯,π π¦ = π₯ππ¦ + π. Margin:
πΏ = min
π
π§ππ
π₯,π π¦π
| π₯ |
max
π₯,π πΏ = max π₯,π min π
π§ππ
π₯,π π¦π
| π₯ |
min
π₯,π
1 2 π₯
2
π§π π₯ππ¦π + π β₯ 1, βπ
β π₯, π, π· = 1 2 π₯
2
β ΰ·
π
π½π[π§π π₯ππ¦π + π β 1] where π· is the Lagrange multiplier
min
π₯
π(π₯) βπ π₯ = 0, β1 β€ π β€ π
β π₯, πΈ = π π₯ + ΰ·
π
πΎπβπ(π₯) where πΎπβs are called Lagrange multipliers
min
π₯
π(π₯) βπ π₯ = 0, β1 β€ π β€ π
πβ ππ₯π = 0; πβ ππΎπ = 0
min
π₯
π(π₯) ππ π₯ β€ 0, β1 β€ π β€ π βπ π₯ = 0, β1 β€ π β€ π
β π₯, π·, πΈ = π π₯ + ΰ·
π
π½πππ(π₯) + ΰ·
π
πΎπβπ(π₯) where π½π, πΎπβs are called Lagrange multipliers
ππ π₯ β max
π·,πΈ:π½πβ₯0 β π₯, π·, πΈ
ππ π₯ = απ π₯ , if π₯ satisfies all the constraints +β, if π₯ does not satisfy the constraints
min
π₯ π π₯ = min π₯ ππ π₯ = min π₯
max
π·,πΈ:π½πβ₯0 β π₯, π·, πΈ
πβ β min
π₯ π π₯ = min π₯
max
π·,πΈ:π½πβ₯0 β π₯, π·, πΈ
πβ β max
π·,πΈ:π½πβ₯0min π₯ β π₯, π·, πΈ
πβ β€ πβ
πβ β min
π₯ π π₯ = min π₯
max
π·,πΈ:π½πβ₯0 β π₯, π·, πΈ
πβ β max
π·,πΈ:π½πβ₯0min π₯ β π₯, π·, πΈ
πβ = πβ?
πβ = β π₯β, π·β, πΈβ = πβ Moreover, π₯β, π·β, πΈβ satisfy Karush-Kuhn-Tucker (KKT) conditions: πβ ππ₯π = 0, π½πππ π₯ = 0 ππ π₯ β€ 0, βπ π₯ = 0, π½π β₯ 0
πβ = β π₯β, π·β, πΈβ = πβ Moreover, π₯β, π·β, πΈβ satisfy Karush-Kuhn-Tucker (KKT) conditions: πβ ππ₯π = 0, π½πππ π₯ = 0 ππ π₯ β€ 0, βπ π₯ = 0, π½π β₯ 0 dual complementarity
πβ = β π₯β, π·β, πΈβ = πβ
πβ ππ₯π = 0, π½πππ π₯ = 0 ππ π₯ β€ 0, βπ π₯ = 0, π½π β₯ 0 dual constraints primal constraints
min
π₯,π
1 2 π₯
2
π§π π₯ππ¦π + π β₯ 1, βπ
β π₯, π, π· = 1 2 π₯
2
β ΰ·
π
π½π[π§π π₯ππ¦π + π β 1] where π· is the Lagrange multiplier
πβ ππ₯ = 0, ο π₯ = Οπ π½ππ§ππ¦π (1) πβ ππ = 0, ο 0 = Οπ π½ππ§π
(2)
β π₯, π, π· = Οπ π½π β
1 2 Οππ π½ππ½ππ§ππ§ππ¦π ππ¦π (3)
combined with 0 = Οπ π½ππ§π , π½π β₯ 0
β π₯, π, π· = ΰ·
π
π½π β 1 2 ΰ·
ππ
π½ππ½ππ§ππ§ππ¦π
ππ¦π
ΰ·
π
π½ππ§π = 0, π½π β₯ 0
ππ¦ + π
Only depend on inner products
Color Histogram
Red Green Blue
Extract features
π¦ π π¦
π π¦π, π¦π = π π¦π ππ(π¦π)
π π¦, π¦β² = π¦ππ¦β² + π π
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
π π¦, π¦β² = exp(β π¦ β π¦β²
2/2π2)
πβ² π¦, π¦β² = exp(π¦ππ¦β²/π2)
πβ² π¦, π¦β² = ΰ·
π +β π¦ππ¦β² π
πππ!
π π¦, π¦β² = ΰ·
π +β
ππππ π¦ ππ(π¦β²) if and only if for any function π(π¦), β« β« π π¦ π π¦β² π π¦, π¦β² ππ¦ππ¦β² β₯ 0 (Omit some math conditions for π and π)
limit, and composition with a power series Οπ
+β ππππ(π¦, π¦β²)
π π¦, π¦β² = 2π1 π¦, π¦β² + 3π2 π¦, π¦β²
π π¦, π¦β² = exp(π1 π¦, π¦β² )
Color Histogram
Red Green Blue
Extract features
π¦ π§ = π₯ππ π¦
build hypothesis
π§ = π₯ππ π¦
build hypothesis
Linear model Nonlinear model
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
π¦1 π¦2 π¦1
2
π¦2
2
2π¦1π¦2 2ππ¦1 2ππ¦2 π π§ = sign(π₯ππ(π¦) + π) First layer is fixed. If also learn first layer, it becomes two layer neural network