optimization for machine learning
play

Optimization for Machine Learning Lecture 1: Introduction to - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43


  1. Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43

  2. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  3. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  4. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  5. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  6. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  7. Convex Functions and Sets Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 43

  8. Convex Functions and Sets Focus of my Lectures S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  9. Convex Functions and Sets Focus of my Lectures S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  10. Convex Functions and Sets Focus of my Lectures 10 2 0 0 − 2 0 − 2 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  11. Convex Functions and Sets Disclaimer My focus is on showing connections between various methods I will sacrifice mathematical rigor and focus on intuition S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 43

  12. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  13. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is strictly convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) <λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  14. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) 2 �·� 2 is convex. A function f is σ -strongly convex if, and only if, f ( · ) − σ That is, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) − σ � � x − x ′ � � 2 2 λ (1 − λ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  15. Convex Functions and Sets Exercise: Jensen’s Inequality Extend the definition of convexity to show that if f is convex, then for all λ i ≥ 0 such that � i λ i = 1 we have �� � � f λ i x i ≤ λ i f ( x i ) i i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 43

  16. Convex Functions and Sets Some Familiar Examples 12 10 8 6 4 2 − 4 − 2 2 4 2 x 2 (Square norm) f ( x ) = 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  17. Convex Functions and Sets Some Familiar Examples 60 40 20 0 3 − 2 2 1 0 0 − 1 2 − 2 − 3 � � 10 , 1 � � x � f ( x , y ) = 1 � x , y 2 2 , 1 y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  18. Convex Functions and Sets Some Familiar Examples 0 − 0 . 2 − 0 . 4 − 0 . 6 0 0 . 2 0 . 4 0 . 6 0 . 8 1 f ( x ) = x log x + (1 − x ) log(1 − x ) (Negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  19. Convex Functions and Sets Some Familiar Examples 0 − 0 . 5 − 1 − 1 . 5 2 1 . 5 − 2 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 . 5 1 . 2 1 . 4 1 . 6 1 . 8 2 0 f ( x , y ) = x log x + y log y − x − y (Un-normalized negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  20. Convex Functions and Sets Some Familiar Examples 4 3 2 1 0 − 3 − 2 − 1 0 1 2 3 f ( x ) = max(0 , 1 − x ) (Hinge Loss) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  21. Convex Functions and Sets Some Other Important Examples Linear functions: f ( x ) = ax + b Softmax: f ( x ) = log � i exp( x i ) �� i x 2 Norms: For example the 2-norm f ( x ) = i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 43

  22. Convex Functions and Sets Convex Sets A set C is convex if, and only if, for all x , x ′ ∈ C and λ ∈ (0 , 1) we have λ x + (1 − λ ) x ′ ∈ C S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 43

  23. Convex Functions and Sets Convex Sets and Convex Functions A function f is convex if, and only if, its epigraph is a convex set S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 43

  24. Convex Functions and Sets Convex Sets and Convex Functions Indicator functions of convex sets are convex � 0 if x ∈ C I C ( x ) = ∞ otherwise . S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 43

  25. Convex Functions and Sets Below sets of Convex Functions 10 2 0 0 − 2 0 − 2 2 f ( x , y ) = x 2 + y 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

  26. Convex Functions and Sets Below sets of Convex Functions 0 − 1 2 − 2 1 0 0 . 5 1 1 . 5 2 0 f ( x , y ) = x log x + y log y − x − y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

  27. Convex Functions and Sets Below sets of Convex Functions If f is convex, then all its level sets are convex Is the converse true? (Exercise: construct a counter-example) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 43

  28. Convex Functions and Sets Minima on Convex Sets Set of minima of a convex function is a convex set Proof: Consider the set { x : f ( x ) ≤ f ∗ } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

  29. Convex Functions and Sets Minima on Convex Sets Set of minima of a strictly convex function is a singleton Proof: try this at home! S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

  30. Operations Which Preserve Convexity Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 43

Recommend


More recommend