nondifferentiable convex functions
play

Nondifferentiable Convex Functions DS-GA 1013 / MATH-GA 2824 - PowerPoint PPT Presentation

Nondifferentiable Convex Functions DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Applications Subgradients Optimization methods Regression The


  1. Proof Assume � g is a subgradient at � x , for any α ≥ 0 g T α � f ( � x + α � e i ) ≥ f ( � x ) + � e i = f ( � x ) + � g [ i ] α g T α � f ( � x ) ≥ f ( � x − α � e i ) + � e i = f ( � x − α � e i ) + � g [ i ] α Combining both inequalities f ( � x ) − f ( � x − α � e i ) g [ i ] ≤ f ( � x + α � e i ) − f ( � x ) ≤ � α α

  2. Proof Assume � g is a subgradient at � x , for any α ≥ 0 g T α � f ( � x + α � e i ) ≥ f ( � x ) + � e i = f ( � x ) + � g [ i ] α g T α � f ( � x ) ≥ f ( � x − α � e i ) + � e i = f ( � x − α � e i ) + � g [ i ] α Combining both inequalities f ( � x ) − f ( � x − α � e i ) g [ i ] ≤ f ( � x + α � e i ) − f ( � x ) ≤ � α α g [ i ] = ∂ f ( � x ) Letting α → 0, implies � ∂� x [ i ]

  3. Subgradient A function f : R n → R is convex if and only if it has a subgradient at every point x ∈ R n there exists � g ∈ R n It is strictly convex if and only for all � such that g T ( � f ( � y ) > f ( � x ) + � y − � x ) , for all � y � = � x .

  4. Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x )

  5. Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all �

  6. Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all � Under strict convexity the minimum is unique

  7. Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x

  8. Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y )

  9. Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y ) g T g T ≥ f 1 ( � x ) + � 1 ( � y − � x ) + f 2 ( � y ) + � 2 ( � y − � x )

  10. Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y ) g T g T ≥ f 1 ( � x ) + � 1 ( � y − � x ) + f 2 ( � y ) + � 2 ( � y − � x ) g T ( � ≥ f ( � x ) + � y − � x )

  11. Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x

  12. Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y )

  13. Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y ) � � g T ≥ η f 1 ( � x ) + � 1 ( � y − � x )

  14. Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y ) � � g T ≥ η f 1 ( � x ) + � 1 ( � y − � x ) g T ≥ f 2 ( � x ) + � 2 ( � y − � x )

  15. Subdifferential of absolute value f ( x ) = | x |

  16. Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 )

  17. Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 ) | y | ≥ gy

  18. Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 ) | y | ≥ gy Holds if and only if | g | ≤ 1

  19. Subdifferential of ℓ 1 norm x ∈ R n if and only if � g is a subgradient of the ℓ 1 norm at � � g [ i ] = sign ( x [ i ]) if x [ i ] � = 0 | � g [ i ] | ≤ 1 if � x [ i ] = 0

  20. Proof � g is a subgradient of ||·|| 1 at � x if and only if � g [ i ] is a subgradient of |·| at � x [ i ] for all 1 ≤ i ≤ n

  21. Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1

  22. Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1 g T ( y − � ≥ | � x [ i ] | + || � x || 1 + � x [ i ]) � e i − || � x || 1

  23. Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1 g T ( y − � ≥ | � x [ i ] | + || � x || 1 + � x [ i ]) � e i − || � x || 1 = | � x [ i ] | + � g [ i ] ( y − � x [ i ]) so � g [ i ] is a subgradient of |·| at | � x [ i ] | for all 1 ≤ i ≤ n

  24. Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1

  25. Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1 n � ≥ | � x [ i ] | + � g [ i ] ( � y [ i ] − � x [ i ]) i = 1

  26. Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1 n � ≥ | � x [ i ] | + � g [ i ] ( � y [ i ] − � x [ i ]) i = 1 g T ( � = || � x || 1 + � y − � x ) so � g is a subgradient of ||·|| 1 at � x

  27. Subdifferential of ℓ 1 norm

  28. Subdifferential of ℓ 1 norm

  29. Subdifferential of ℓ 1 norm

  30. Subdifferential of the nuclear norm Let X ∈ R m × n be a rank- r matrix with SVD USV T , where U ∈ R m × r , V ∈ R n × r and S ∈ R r × r A matrix G is a subgradient of the nuclear norm at X if and only if G := UV T + W where W satisfies || W || ≤ 1 U T W = 0 W V = 0

  31. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� �� � x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � �

  32. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | �

  33. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� �� � x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | �

  34. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 �� � �� � x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | �

  35. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | � 2 � 2 � � � UV T � � 2 + || W || 2 � � � � � 2 �� � �� ≤ � P row ( X ) � x � P row ( X ) ⊥ � x � � � � � � � � � � � � � � 2

  36. Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | � 2 � 2 � � UV T � � � 2 + || W || 2 � � � � � 2 �� � �� ≤ � P row ( X ) � x � P row ( X ) ⊥ � x � � � � � � � � � � � � � � 2 ≤ 1

  37. Hölder’s inequality for matrices For any matrix A ∈ R m × n , || A || ∗ = sup � A , B � . {|| B ||≤ 1 | B ∈ R m × n }

  38. Proof For any matrix Y ∈ R m × n || Y || ∗ ≥ � G , Y � = � G , X � + � G , Y − X � � � UV T , X = + � W , X � + � G , Y − X �

  39. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � UV T , X

  40. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr

  41. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr

  42. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr

  43. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr = tr ( S )

  44. Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr = tr ( S ) = || X || ∗

  45. Proof For any matrix Y ∈ R m × n || Y || ∗ ≥ � G , Y � = � G , X � + � G , Y − X � � � UV T , X = + � G , Y − X � � � UV T , X = + � W , X � + � G , Y − X � = || X || ∗ + � G , Y − X �

  46. Sparse linear regression with 2 features y := α � � x 1 + � z � � � � X := x 1 x 2 || � x 1 || 2 = 1 || � x 2 || 2 = 1 � � x 1 , � x 2 � = ρ

  47. Analysis of lasso estimator Let α ≥ 0 � α + � x T � 1 � z − λ � β lasso = 0 as long as � � x T x T � � 2 � z − ρ� 1 � z � x T ≤ λ ≤ α + � 1 � z 1 − | ρ |

  48. Lasso estimator 1.0 0.8 Coefficients 0.6 0.4 0.2 0.0 0.00 0.05 0.10 0.15 0.20 Regularization parameter

  49. Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all � Under strict convexity the minimum is unique

  50. Proof The cost function is strictly convex if n ≥ 2 and ρ � = 1 Aim: Show that there is a subgradient equal to � 0 at a 1-sparse solution

  51. Proof The gradient of the quadratic term := 1 2 � � � � � � � � X � β β − � q y � � � � 2 � � � 2 at � β lasso equals � � = X T � � � X � ∇ q β lasso β lasso − � y

Recommend


More recommend