(Sub)Gradients and Convexity (contd) A subdifferential is the - PowerPoint PPT Presentation

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all subgradients of the convex function f : ∂ f ( x ) ={ h ∈ ℜ n : h is a subgradient of f at x } Note that this set is guaranteed to be nonempty unless f is not convex. Often an indicator function, I C : ℜ n → ℜ , is employed to remove the contraints of an optimization problem (note that convex set C ⊆ ℜ n ): { 0if x ∈ C min f ( x ) ⇐⇒ min x f ( x ) + I C ( x ),where I C ( x ) = I { x ∈ C }= ∞ if x / ∈ C x ∈ C The subdifferential of the indicator function at x is the normal cone for all x in C August 28, 2018 61 / 402

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all subgradients of the convex function f : ∂ f ( x ) ={ h ∈ ℜ n : h is a subgradient of f at x } Note that this set is guaranteed to be nonempty unless f is not convex. Often an indicator function, I C : ℜ n → ℜ , is employed to remove the contraints of an optimization problem (note that convex set C ⊆ ℜ n ): { 0if x ∈ C min f ( x ) ⇐⇒ min x f ( x ) + I C ( x ),where I C ( x ) = I { x ∈ C }= ∞ if x / ∈ C x ∈ C The subdifferential of the indicator function at x is known as the normal cone , N C ( x ), of C : N C ( x ) =∂ I C ( x ) ={ h ∈ ℜ n : h T x ≥ h T y for any y ∈ C } August 28, 2018 61 / 402

Normal Cones (Tangent Cone and Polar) for some Convex Sets If C is a convex set and if.. x ∈ int ( C )then N C ( x ) ={ 0 }. In general, if x ∈ int ( domain ( f ))then∂ f ( x )is nonempty and bounded. x ∈ C then N C ( x )is a closed convex cone. In general, ∂ f ( x )is(possiblyempty)closed convex set since it is the intersection of half spaces There is a relation between the intuitive tangent cone and normal cone at a point x ∈ ∂ C ....This relation is the polar relation. Let us construct the normal cone , N C ( x )for some points in a convex set C : Tangent cone Normal cone August 28, 2018 62 / 402

Differentiable convex function has unique subgradient: Proof Stated inquitively earlier. Now formally: n then∂ f ( x ) ={ ∇ f ( x )} n → ℜ be a convex function. If f is differentiable at x ∈ ℜ Let f : ℜ We know from (9) that for a differentiable f :D→ ℜ and open convex setD, f is convex Convexity in terms of first order approximation iff , August 28, 2018 63 / 402

Differentiable convex function has unique subgradient: Proof Stated inquitively earlier. Now formally: n then∂ f ( x ) ={ ∇ f ( x )} n → ℜ be a convex function. If f is differentiable at x ∈ ℜ Let f : ℜ We know from (9) that for a differentiable f :D→ ℜ and open convex setD, f is convex iff , for any x , y ∈ D, f ( y ) ≥ f ( x ) + ∇ T f ( x )( y − x ) Thus, ∇ f ( x ) ∈ ∂ f ( x ). Let h ∈ ∂ f ( x ), then h T ( y − x ) ≤ f ( y ) − f ( x ). Since f is differentiable at x , we have that The directional derivative exists at x along any direction (including along y-x) August 28, 2018 63 / 402

Differentiable convex function has unique subgradient: Proof Stated inquitively earlier. Now formally: n then∂ f ( x ) ={ ∇ f ( x )} n → ℜ be a convex function. If f is differentiable at x ∈ ℜ Let f : ℜ We know from (9) that for a differentiable f :D→ ℜ and open convex setD, f is convex iff , for any x , y ∈ D, f ( y ) ≥ f ( x ) + ∇ T f ( x )( y − x ) Thus, ∇ f ( x ) ∈ ∂ f ( x ). Let h ∈ ∂ f ( x ), then h T ( y − x ) ≤ f ( y ) − f ( x ). Since f is differentiable at x , we have that lim f ( y )− f ( x )− ∇ f ( x )( y − x ) T = 0 ∥ y − x ∥ y → x T f ( y ) − f ( x ) − ∇ f ( x )( y − x ) < ϵ whenever Thus for anyϵ>0there exists aδ>0such that ∥ y − x ∥ ∥ y − x ∥ <δ. Multiplyingboth sides by ∥ y − x ∥ and adding ∇ T f ( x )( y − x )to both sides, we get f ( y ) − f ( x )< ∇ T f ( x )( y − x ) + ϵ ∥ y − x ∥ whenever ∥ y − x ∥ < δ August 28, 2018 63 / 402

Differentiable convex function has unique subgradient: Proof But then, given that h ∈ ∂ f ( x ),we obtain h T ( y − x ) ≤ f ( y ) − f ( x )< ∇ T f ( x )( y − x ) + ϵ ∥ y − x ∥ whenever ∥ y − x ∥ < δ Rearranging we get( h − ∇ f ( x )) T ( y − x )<ϵ ∥ y − x ∥ whenever ∥ y − x ∥ <δ Consider y − x = At this point, we can try and choose any epsilon and any y-x whose norm will be less than delta August 28, 2018 64 / 402

Differentiable convex function has unique subgradient: Proof But then, given that h ∈ ∂ f ( x ),we obtain h T ( y − x ) ≤ f ( y ) − f ( x )< ∇ T f ( x )( y − x ) + ϵ ∥ y − x ∥ whenever ∥ y − x ∥ < δ Rearranging we get( h − ∇ f ( x )) T ( y − x )<ϵ ∥ y − x ∥ whenever ∥ y − x ∥ <δ − ∇ f ( x δ( h )) δ that has norm ∥ . ∥ = Consider y − x = less thanδ. Then, substituting in − ∇ f ( x 2 ∥ h ) ∥ 2 T ( δ ( h − ∇ f ( x )) ) < ϵ δ y-x = unit vector * delta/2 the previous step:( h − ∇ f ( x )) 2 ∥ h − ∇ f ( x ) ∥ 2 Canceling out common terms and evaluating dot product as eucledian norm we get: ∥ h − ∇ f ( x )) ∥ <ϵ, which should be true for anyϵ>0, it should be that ∥ h − ∇ f ( x )) ∥ = 0. Thus, it must be that h = ∇ f ( x )) August 28, 2018 64 / 402

The Why of (Sub)Gradient August 28, 2018 65 / 402

Local and Global Minima, Gradients and Convexity Recall that for functions of single variable, at local extreme points, the tangent to the curve is a line with a constant component in the direction of the function and is therefore parallel to the x -axis. ▶ If the function is differentiable at the extreme point, then the derivative must vanish. This idea can be extended to functions of multiple variables. The requirement in this case turns out to be that the tangent plane to the function at any extreme point must be parallel to the plane z = 0. This can happen if and only if the gradient ∇ F is parallel to the z −axis at the extreme point, ▶ or equivalently, the gradient to the function f must be the zero vector at every extreme point. F(x,z) = f(x) - z August 28, 2018 66 / 402

(Sub)Gradients and Optimality: Sufficient Condition h^T(y-x) >= 0 for all y ...... su ffi cient condition 1 0 is a subgradient ............... su ffi cient condition 2 For a convex f , August 28, 2018 67 / 402

(Sub)Gradients and Optimality: Sufficient Condition For a convex f , f ( x ∗ ) = min f ( x ) ⇐ 0 ∈ ∂ f ( x ∗ ) x ∈ R n The reason: h = 0being a subgradient means that for all y f(y) >= f(x) August 28, 2018 67 / 402

(Sub)Gradients and Optimality: Sufficient Condition For a convex f , f ( x ∗ ) = min f ( x ) ⇐ 0 ∈ ∂ f ( x ∗ ) x ∈ R n The reason: h = 0being a subgradient means that for all y f ( y )≥ f ( x ∗ ) + 0 T ( y − x ∗ ) = f ( x ∗ ) The analogy to the differentiable case is:∂ f ( x ) ={ ∇ f ( x )}. Thus, for a convex function f ( x ), if ∇ f ( x ) = 0, then x must be a point of glolbal minimum. Is there a necessary condition for a differentiable (possibly non-convex) function having a (local or global) minimum at x ? (A little later) August 28, 2018 67 / 402

Local Extrema: Necessary Condition through Fermat’s Theorem A theorem fundamental to determining the locally extreme values of functions of multiple variables. Claim If f ( x ) defined on a domain D ⊆ ℜ has a local maximum or minimum at x ∗ and if the n first-order partial derivatives exist at x ∗ , then f x i ( x ∗ ) = 0 for all 1 ≤ i ≤ n. Proof: August 28, 2018 70 / 402

Local Extrema: Fermat’s Theorem To formally prove this result, Consider the function g i ( x i ) = f ( x ∗ , x ∗ , . . . , x ∗ , x i , x ∗ , . . . , x 1 i − 1 1 2 i +1 ∗ ). n If f has a local minimum (maximum) at x ∗ , then 2 g_i also has a local min at x_i* August 28, 2018 71 / 402

Local Extrema: Fermat’s Theorem To formally prove this result, Consider the function g i ( x i ) = f ( x ∗ , x ∗ , . . . , x ∗ , x i , x ∗ , . . . , x 1 i − 1 1 2 i +1 ∗ ). n If f has a local minimum (maximum) at x ∗ , then there exists an open ball 2 ∗ ∗ ∥ < ϵ }around x such that for all x ∈ B ϵ , f ( x ∗ ) ≤ f ( x ) ( f ( x ∗ ) ≥ f ( x )) B ϵ = { x | ∥ x − x Consider the norm to be the Eucledian norm ∥ . ∥ 2 . By Cauchy Shwarz inequality, for a 3 th index in the vector, unit norm vector e i = [0..1..0]with a1only in the i August 28, 2018 71 / 402

Local Extrema: Fermat’s Theorem To formally prove this result, Consider the function g i ( x i ) = f ( x ∗ , x ∗ , . . . , x ∗ , x i , x ∗ , . . . , x 1 i − 1 1 2 i +1 ∗ ). n If f has a local minimum (maximum) at x ∗ , then there exists an open ball 2 ∗ ∗ ∥ < ϵ }around x such that for all x ∈ B ϵ , f ( x ∗ ) ≤ f ( x ) ( f ( x ∗ ) ≥ f ( x )) B ϵ = { x | ∥ x − x Consider the norm to be the Eucledian norm ∥ . ∥ 2 . By Cauchy Shwarz inequality, for a 3 th index in the vector, unit norm vector e i = [0..1..0]with a1only in the i ∗ )|=| x i − x ∗ | ≤ ∥ x − x ∗ ∥∥ e i ∥ = ∥ x − x ∗ ∥ . | e T ( x − x i i ∗ Thus, the existence of an open ball{ x | ∥ x − x ∗ ∥ < ϵ }around x characterizing the 4 minimum in ℜ n also guarantees existence of an open ball around x_i* characterizing the miniumum of g_i(.) in R August 28, 2018 71 / 402

(Sub)Gradients and Convexity (contd) A subdifferential is the - PowerPoint PPT Presentation

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all subgradients of the convex function f : f ( x ) ={ h n : h is a subgradient of f at x } Note that this set is guaranteed to be nonempty unless f

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Outline Last time Image gradients Seam carving gradients as energy Edges

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

Discrete convexity and packages Gleb Koshevoy IITP(RAS) and Poncelet Center (CNRS) 12/05/2020,

Convexity and the Kalmbach monad Gejza Jena August 10, 2018 Gejza Jena Convexity and the

A Tightrope Walk Between Convexity and Non-convexity in Computer Vision Thomas Pock Institute

Convexity and Polyhedra Carlo Mannino (from Geir Dahl notes on convexity) University of Oslo,

Optimal covering of a straight line application to discrete convexity Jean-Marc Chassery, Isabelle

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

3. Convex functions basic properties and examples operations that preserve convexity

Unit 1: Convexity Mathematics II Departament de Matemtiques per a lEconomia i lEmpresa

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Summary Key topics. Familiarity with form of basic network gradient. Deep network

Summary Key topics. Familiarity with form of basic network gradient. Deep network

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

Extending a Base Product for Multiple Customers Denis Defreyne MediaGeniX NG 1 2 Your

7. Separating Hyperplane Theorems II Daisuke Oyama Mathematics II May 7, 2020 Farkas Lemma

Processing Regression and Prediction Class 14. 25 Oct 2016 Instructor: Bhiksha Raj 11755/18797

Decision Aid Methodologies In Transportation Lecture 10: Data Mining in Transport

Fictitious Play beats Simplex for fractional packing and covering Christos Koufogiannakis and

Distribution and Mobility in Software Architectures Jos Luiz Fiadeiro Research Context

Plan 2 Outline: Continuous linear operators on normed spaces Explicit examples: shift on R d

FROM SPECIFICATION TO COLLABORATION one agencys move to scrum Stephanie El-Hajj WHAT IS