the how of sub gradient
play

The How of (Sub)Gradient Note: Subdifferential is intersection of - PowerPoint PPT Presentation

The How of (Sub)Gradient Note: Subdifferential is intersection of infinite half-spaces and is therefore convex and closed August 31, 2018 82 / 402 The How of (Sub)Gradient Note: Subdifferential is intersection of infinite half-spaces and is


  1. The How of (Sub)Gradient Note: Subdifferential is intersection of infinite half-spaces and is therefore convex and closed August 31, 2018 82 / 402

  2. The How of (Sub)Gradient Note: Subdifferential is intersection of infinite half-spaces and is therefore a closed convex set even if f is NOT convex. August 31, 2018 82 / 402

  3. First peek into subgradient calculus: Function Convexity First Following functions are convex, but may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Pointwise maximum: If f 1 ,f 2 , . . . ,f m are convex, then f ( x ) = max { f 1 ( x ) ,f 2 ( x ) , ...,f m ( x ) } is convex In Quiz 1, problem 1, m=2 f1 = ||x||_1 f2 = ||x||_in fi nity August 31, 2018 83 / 402

  4. First peek into subgradient calculus: Function Convexity First Following functions are convex, but may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Pointwise maximum: If f 1 ,f 2 , . . . ,f m are convex, then { } f ( x ) = max f ( x ) ,f ( x ) , . . . ,f ( x ) is also convex. For example: 1 2 m ▶ Sum of r largest components of x ∈ ℜ n f ( x ) = x [1] + x [2] + . . . + x [ r ] , where x [1] is the i th largest component of x , is Proof: Either from fi rst principles (invoking convexity of f1...fm) Or Inspect intersection of epigraphs of f1...fm Will our proof of convexity hold for an in fi nite (possibly even uncountable) number of indices i (which had a fi nite set of values 1...m above)? ANS: Yes!! August 31, 2018 83 / 402

  5. First peek into subgradient calculus: Function Convexity First Following functions are convex, but may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Pointwise maximum: If f 1 ,f 2 , . . . ,f m are convex, then { } f ( x ) = max f ( x ) ,f ( x ) ,..., f ( x ) is also convex. For example: 1 2 m ▶ Sum of r largest components of x ∈ ℜ n f ( x ) = x [1] + x [2] + . . . + x [ r ] , where x [1] is the i th largest component of x , is a convex function. Pointwise supremum: If f ( x , y )is convex in x for every y ∈ S , then g ( x ) =sup f ( x , y ) y ∈ S is convex by a proof similar to S is a set of possibly that on the board: in fi nite number of indices RHS will have sup over y instead of max over i Similarly, LHS will also have sup over y instead of max over i August 31, 2018 83 / 402

  6. First peek into subgradient calculus: Function Convexity First Following functions are convex, but may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Pointwise maximum: If f 1 ,f 2 , . . . ,f m are convex, then { } f ( x ) = max f ( x ) ,f ( x ) ,..., f ( x ) is also convex. For example: 1 2 m ▶ Sum of r largest components of x ∈ ℜ n f ( x ) = x [1] + x [2] + . . . + x [ r ] , where x [1] is the i th largest component of x , is a convex function. Pointwise supremum: If f ( x , y )is convex in x for every y ∈S , then g ( x ) =sup y ∈ S f ( x , y ) is convex. For example: ▶ The function that returns the maximum eigenvalue of a symmetric matrix X , viz. , ∥ X y ∥ 2 is λ max ( X ) =sup a convex function obtained as supremum y ∈S ∥ y ∥ 2 over an in fi nite number of y with ||y||_2 = 1 over the function ||Xy||_2 August 31, 2018 83 / 402

  7. First peek into subgradient calculus: Function Convexity First Following functions are convex, but may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Pointwise maximum: If f 1 ,f 2 , . . . ,f m are convex, then { } f ( x ) = max f ( x ) ,f ( x ) ,..., f ( x ) is also convex. For example: 1 2 m ▶ Sum of r largest components of x ∈ ℜ n f ( x ) = x [1] + x [2] + . . . + x [ r ] , where x [1] is the i th largest component of x , is a convex function. Pointwise supremum: If f ( x , y )is convex in x for every y ∈S , then g ( x ) =sup y ∈ S f ( x , y ) is convex. For example: ▶ The function that returns the maximum eigenvalue of a symmetric matrix X , viz. , λ max ( X ) =sup ∥ X y ∥ 2 is a convex function of the symmetrix matrix X . y ∈S ∥ y ∥ 2 If X is symmetrix, max eigenvalue of X^TX is squared of max eigenvalue of X August 31, 2018 83 / 402

  8. Basic Subgradient Calculus: Illustration for pointwiseMaximum Finite pointwise maximum: if f ( x ) = max i =1 ... m f i ( x ), then ∂f ( x ) = subdi ff erential of f_i(x) at points x where f(x) = f_i(x) (that is points where there is a unique/unambiguous maximizer, the subdi ff erential of f(x) is the subdi ff erential of that unique maximizer) Convex hull o f subdi ff erentials of f_i(x) for all i s.t f(x) = f_i(x) (that is points where there is a unique/unambiguous maximizer, the subdi ff erential of f(x) is the subdi ff erential of that unique maximizer) Includes union August 31, 2018 84 / 402

  9. Basic Subgradient Calculus: Illustration for pointwiseMaximum Finite pointwise maximum: if f ( x ) = max i =1 ... m f i ( x ), then ( ) ∪ ∂f ( x ) = conv ∂f i ( x ) , which is the convex hull of union of subdifferentials of i : f i ( x )= f ( x ) all active functions at x . General pointwise maximum: if f ( x ) = max s ∈ S f s ( x ),then closure of convex hull under some regularity conditions (on S , f s ), ∂f ( x ) = of union of subdi ff erentials Additional operation that ensures the subdi ff erential to be closed August 31, 2018 84 / 402

  10. Basic Subgradient Calculus: Illustration for pointwiseMaximum Finite pointwise maximum: if f ( x ) = max i =1 ... m f i ( x ), then ( ) ∪ ∂f ( x ) = conv ∂f i ( x ) , which is the convex hull of union of subdifferentials of i : f i ( x )= f ( x ) all active functions at x . General pointwise maximum: if f ( x ) = max s ∈ S f s ( x ),then { ( ) } ∪ ∂ f s ( x ) under some regularity conditions (on S , f s ), ∂ f ( x ) = cl conv s : f s ( x )= f ( x ) August 31, 2018 84 / 402

  11. Subgradient of ∥ x ∥ 1 Assume x ∈ ℜ n . Then ∥ x ∥ 1 = max over 2^n functions each corresponding to s^Tx August 31, 2018 85 / 402

  12. Subgradient of ∥ x ∥ 1 Assume x ∈ ℜ n . Then ∥ x ∥ 1 =max x T s which is a pointwise maximum of2 n functions s ∈ { − 1 , +1 } n Let S ∗ ⊆{− 1 , +1 } n be the set of s such that for each s ∈S ∗ , the value of x T s is the same max value. ( ) ∪ Thus, ∂ ∥ x ∥ 1 = conv s . s ∈ S ∗ August 31, 2018 85 / 402

  13. More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? ∑ n 1 ≤ i ≤ n is convex and Nonnegative weighted sum: f = α f is convex if each f for i i i i =1 α i ≥ 0 , 1 ≤ i ≤ n . Composition with affine function: f ( Ax + b )is convex if f is convex. For example: ∑ m The log barrier for linear inequalities, f ( x ) = − ( b − a x ), is convex since − log( x )is log T ▶ i i i =1 convex. ▶ Any norm of an affine function, f ( x ) = || Ax + b || , is convex. if A is m x n, then f() is de fi ned on R^n whereas f(Ax+b) is de fi ned on R^m August 31, 2018 86 / 402

Recommend


More recommend