more subgradient calculus function convexity
play

More Subgradient Calculus: Function Convexity first Following - PowerPoint PPT Presentation

More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? n i f i is convex if


  1. More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? n ∑ α i f i is convex if each f i for1 ≤ i ≤ n is convex and Nonnegative weighted sum: f = i =1 α i ≥0,1≤ i ≤ n . Composition with affine function: f ( Ax + b )is convex if f is convex. For example: ∑ m The log barrier for linear inequalities, f ( x ) =− log( b i − a T x ), is convex since−log( x )is ▶ i i =1 convex. ▶ Any norm of an affine function, f ( x ) =|| Ax + b ||, is convex. September 1, 2018 86 / 402

  2. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) Norms: important special case, f ( x ) =|| x || p The derivations done in class could be used to show that if any other subgradient exists for g outside the stated set above, that could be used to construct a subgradient for f outside the stated set above as well! September 1, 2018 87 / 402

  3. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) = max z T x where q is such that Norms: important special case, f ( x ) = || x || p || z || q ≤1 1/ p + 1/ q = 1. Then On the board we have used y instead of z September 1, 2018 87 / 402

  4. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) = max z T x where q is such that Norms: important special case, f ( x ) = || x || p || z || q ≤1 1/ p + 1/ q = 1. Then { } ∂ f ( x ) = q ≤ 1and y z T x y :|| y || T x =max = || z || q ≤1 y corresponds to z where the max is attained The part above is largely connected to previous discussion on max of convex functions September 1, 2018 87 / 402

  5. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) Norms: important special case, f ( x ) = || x || = max z T x where q is such that p || z || q ≤1 This is derived in 1/ p + 1/ q = 1. Then class { } { } ∂ f ( x ) = q ≤ 1and y q ≤ 1and y y :|| y || T x =max z T x T x = || x || = y :|| y || p || z || q ≤1 Why ||y||_q <= 1 is because of Minkowski's inequality September 1, 2018 87 / 402

  6. Subgradients for the ‘Lasso’ Problem in Machine Learning We use Lasso (min f ( x )) as an example to illustrate subgradients of affine composition: x f ( x ) = 1|| y − x || 2 +λ|| x || 1 2 The subgradients of f ( x )are x - y + \lambda s Where s = {+1,-1}^n such that ||x||_1 = s^T x September 1, 2018 88 / 402

  7. Subgradients for the ‘Lasso’ Problem in Machine Learning We use Lasso (min f ( x )) as an example to illustrate subgradients of affine composition: x f ( x ) = 1|| y − x || 2 +λ|| x || 1 2 The subgradients of f ( x )are h = x − y +λ s , i ∈ [−1,1]if x where s i = sign ( x i )if x i = 0and s i = 0. Second component is a result of the convex hull September 1, 2018 88 / 402

  8. More Subgradient Calculus: Composition Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if We will consider ▶ q i is convex, p is convex and nondecreasing in each argument only the first case ▶ or q i is concave, p is convex and nonincreasing in each argument September 1, 2018 89 / 402

  9. More Subgradient Calculus: Composition Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if In both conditions, q i is convex, p is convex and nondecreasing in each argument composition will be ▶ ▶ or q i is concave, is convex and nonincreasing in each argument p concave if p is Some examples illustrating this property are: concave exp q ( x )is convex if q is convex ▶ exp is a monotonic and convex p ∑ m log q i ( x )is concave if q i are concave and positive ▶ p is concave i =1 and hence the ∑ m ▶ log exp q i ( x )is convex if q i are convex composition is concave i =1 ▶ 1/ q ( x )is convex if q is concave and positive September 1, 2018 89 / 402

  10. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): September 1, 2018 90 / 402

  11. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): ( ) f ( y ) = p ( q 1 ( y ), . . . , q k ( y )) ≥ p q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ▶ q 1 q k Where h q i ∈ ∂ q i ( x )for i = 1.. k and since p (.)is non-decreasing in each argument. ( ) q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) p ≥ ▶ q 1 q k ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( h T ( y − x ), . . . , h T ( y − x ) p q 1 q k Where h p ∈ ∂ p ( q 1 ( x ), . . . , q k ( x )) All we need to do next is club together h_p and h_q and leave only (y-x) in the second component September 1, 2018 90 / 402

  12. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): ( ) f ( y ) = p ( q 1 ( y ), . . . , q k ( y )) ≥ p q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ▶ q 1 q k Where h q i ∈ ∂ q i ( x )for i = 1.. k and since p (.)is non-decreasing in each argument. ) ( q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ≥ ▶ p q 1 q k ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( h T ( y − x ), . . . , h T ( y − x ) p q 1 q k Where h p ∈ ∂ p ( q 1 ( x ), . . . , q k ( x )) = f ( x ) + ∑ ( h p ) i h T ( y − x ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( ) k h T ( y − x ), . . . , h T ( y − x ) ▶ p q 1 q k q i i =1 That is, ∑ ( h p ) i h q is a subgradient of the composite function at x . k i H/W: Derive the subdi ff erentials to example functions on previous slide i =1 September 1, 2018 90 / 402

  13. More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum: If c ( x , y )is convex in( x , y )andCis a convex set, then d ( x ) =inf c ( x , y )is y ∈ C convex. For example: ▶ Let d ( x ,C)that returns the distance of a point x to a convex setC. That is || x − y ||= || x − P d ( x ,C) = inf C ( x )||, where, P C ( x ) = argmin d ( x ,C) . Then d ( x ,C)is a y ∈ C x − P C ( x ) convex function and ∇ d ( x ,C) = ∥ x − P C ( x ) ∥ H/w: Prove that d is convex if c is a convex function and if C is a convex set September 1, 2018 91 / 402

  14. More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum: If c ( x , y )is convex in( x , y )andCis a convex set, then d ( x ) =inf c ( x , y )is y ∈ C convex. For example: ▶ Let d ( x ,C)that returns the distance of a point x to a convex setC. That is || x − y ||= || x − P d ( x ,C) = inf C ( x )||, where, P C ( x ) = argmin d ( x ,C) . Then d ( x ,C)is a y ∈ C x − P C ( x ) convex function and ∇ d ( x ,C) = ....The point of intersection of convex sets ∥ x − P C ( x ) ∥ C 1 , C 2 ,... C m by minimizing... (Subgradients and Alternating Projections) ▶ argmin d ( x ,C)is a special case of the proximity operator: prox c ( x ) = argmin PROX c ( x )of a y ∈ C y 1 || x − y ||The special case is when convex function c ( x ). Here, PROX c ( x ) = c ( y ) + 2 c(x) is the indicator function over C September 1, 2018 91 / 402

Recommend


More recommend