introduction to convex optimization for machine learning
play

Introduction to Convex Optimization for Machine Learning John Duchi - PowerPoint PPT Presentation

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 1 / 53 Outline What is


  1. Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 1 / 53

  2. Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 2 / 53

  3. What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 3 / 53

  4. What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f 0 ( x ) x s.t. f i ( x ) ≤ 0 , i = { 1 , . . . , k } h j ( x ) = 0 , j = { 1 , . . . , l } Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 4 / 53

  5. What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f 0 ( x ) x s.t. f i ( x ) ≤ 0 , i = { 1 , . . . , k } h j ( x ) = 0 , j = { 1 , . . . , l } ◮ Example: Stock market. “Minimize variance of return subject to getting at least $50.” Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 4 / 53

  6. What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. ◮ Linear regression: � Xw − y � 2 minimize w ◮ Classification (logistic regresion or SVM): n � 1 + exp( − y i x T � � minimize log i w ) w i =1 n or � w � 2 + C � ξ i s.t. ξ i ≥ 1 − y i x T i w, ξ i ≥ 0 . i =1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 5 / 53

  7. What is Optimization We still care... ◮ Maximum likelihood estimation: n � maximize log p θ ( x i ) θ i =1 ◮ Collaborative filtering: 1 + exp( w T x i − w T x j ) � � � minimize log w i ≺ j ◮ k -means: k � � � x i − µ j � 2 minimize J ( µ ) = µ 1 ,...,µ k j =1 i ∈ C j ◮ And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 6 / 53

  8. What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f 0 ◮ All kinds of constraints (even restricting to continuous functions): h ( x ) = sin(2 πx ) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 0 −1 −1 −2 −2 −3 −3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 7 / 53

  9. What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f 0 ◮ All kinds of constraints (even restricting to continuous functions): h ( x ) = sin(2 πx ) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 0 −1 −1 −2 −2 −3 −3 ◮ Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 7 / 53

  10. Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 8 / 53

  11. Convex Sets Definition A set C ⊆ R n is convex if for x, y ∈ C and any α ∈ [0 , 1] , αx + (1 − α ) y ∈ C. y x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 9 / 53

  12. Convex Sets Examples ◮ All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

  13. Convex Sets Examples ◮ All of R n (obvious) ◮ Non-negative orthant, R n + : let x � 0 , y � 0 , clearly αx + (1 − α ) y � 0 . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

  14. Convex Sets Examples ◮ All of R n (obvious) ◮ Non-negative orthant, R n + : let x � 0 , y � 0 , clearly αx + (1 − α ) y � 0 . ◮ Norm balls: let � x � ≤ 1 , � y � ≤ 1 , then � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � ≤ 1 . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 10 / 53

  15. Convex Sets Examples ◮ Affine subspaces: Ax = b , Ay = b , then A ( αx + (1 − α ) y ) = αAx + (1 − α ) Ay = αb + (1 − α ) b = b. 1 0.8 0.6 0.4 x 3 0.2 0 −0.2 −0.4 1 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 0.2 x 2 0 0 x 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 11 / 53

  16. Convex Sets More examples ◮ Arbitrary intersections of convex sets: let C i be convex for i ∈ I , C = � i C i , then x ∈ C, y ∈ C ⇒ αx + (1 − α ) y ∈ C i ∀ i ∈ I so αx + (1 − α ) y ∈ C . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 12 / 53

  17. Convex Sets More examples ◮ PSD Matrices, a.k.a. the positive semidefinite cone S n + ⊂ R n × n . A ∈ S n + means x T Ax ≥ 0 for all x ∈ R n . For 1 A, B ∈ S + n , 0.8 0.6 x T ( αA + (1 − α ) B ) x z 0.4 = αx T Ax + (1 − α ) x T Bx ≥ 0 . 0.2 0 1 0.5 1 ◮ On right: 0.8 0 0.6 0.4 −0.5 0.2 y −1 0 x �� x � � z S 2 x, y, z : x ≥ 0 , y ≥ 0 , xy ≥ z 2 � � + = � 0 = z y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 13 / 53

  18. Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 14 / 53

  19. Convex Functions Definition A function f : R n → R is convex if for x, y ∈ dom f and any α ∈ [0 , 1] , f ( αx + (1 − α ) y ) ≤ αf ( x ) + (1 − α ) f ( y ) . f ( y ) αf ( x ) + (1 - α ) f ( y ) f ( x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 15 / 53

  20. Convex Functions First order convexity conditions Theorem Suppose f : R n → R is differentiable. Then f is convex if and only if for all x, y ∈ dom f f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) f ( y ) f ( x ) + ∇ f ( x ) T ( y - x ) ( x, f ( x )) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 16 / 53

  21. Convex Functions Actually, more general than that Definition The subgradient set , or subdifferential set, ∂f ( x ) of f at x is g : f ( y ) ≥ f ( x ) + g T ( y − x ) for all y � � ∂f ( x ) = . f ( y ) Theorem f : R n → R is convex if and only if it has non-empty ( x, f ( x )) subdifferential set everywhere. f ( x ) + g T ( y - x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 17 / 53

  22. Convex Functions Second order convexity conditions Theorem Suppose f : R n → R is twice differentiable. Then f is convex if and only if for all x ∈ dom f , ∇ 2 f ( x ) � 0 . 10 8 6 4 2 0 2 1 2 1 0 0 −1 −1 −2 −2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 18 / 53

  23. Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the epi f set of points epi f = { ( x, t ) : f ( x ) ≤ t } . ◮ epi f is convex if and only if f is convex. a ◮ Sublevel sets, { x : f ( x ) ≤ a } are convex for convex f . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 19 / 53

  24. Convex Functions Examples Examples ◮ Linear/affine functions: f ( x ) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 20 / 53

  25. Convex Functions Examples Examples ◮ Linear/affine functions: f ( x ) = b T x + c. ◮ Quadratic functions: f ( x ) = 1 2 x T Ax + b T x + c for A � 0 . For regression: 1 2 � Xw − y � 2 = 1 2 w T X T Xw − y T Xw + 1 2 y T y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 20 / 53

  26. Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

  27. Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . ◮ Composition with an affine function f ( Ax + b ) : f ( A ( αx + (1 − α ) y ) + b ) = f ( α ( Ax + b ) + (1 − α )( Ay + b )) ≤ αf ( Ax + b ) + (1 − α ) f ( Ay + b ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

  28. Convex Functions Examples More examples ◮ Norms (like ℓ 1 or ℓ 2 for regularization): � αx + (1 − α ) y � ≤ � αx � + � (1 − α ) y � = α � x � + (1 − α ) � y � . ◮ Composition with an affine function f ( Ax + b ) : f ( A ( αx + (1 − α ) y ) + b ) = f ( α ( Ax + b ) + (1 − α )( Ay + b )) ≤ αf ( Ax + b ) + (1 − α ) f ( Ay + b ) ◮ Log-sum-exp (via ∇ 2 f ( x ) PSD): � n � � f ( x ) = log exp( x i ) i =1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 21 / 53

  29. Convex Functions Examples Important examples in Machine Learning 3 ◮ SVM loss: [1 - x ] + 1 − y i x T � � f ( w ) = i w + ◮ Binary logistic loss: log(1 + e x ) 1 + exp( − y i x T � � f ( w ) = log i w ) 0 −2 3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall 2009 22 / 53

Recommend


More recommend