Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x − 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 / 403

Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is therefore given by2( x 1 − 1) + 2( x 2 − 1) − 7 = z . The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do youexpect? August 24, 2018 49 / 403

Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is y2( x 1 − 1) + 2( x 2 − 1) − 7 = z . therefore given b The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do you expect? Ans: A horizontal tanget plane and a vertical gradient! August 24, 2018 49 / 403

First-Order Convexity Conditions: The completestatement Theorem For differentiable f : D→ ℜ and open convex set D , f is convex iff , for any x , y ∈ D , 1 T f ( x )( y − x ) f ( y ) ≥ f ( x ) + ∇ (9) f is strictly convex iff , for any x , y ∈ D , with x ̸ = y , 2 T f ( x )( y − x ) f ( y ) > f ( x ) + ∇ Strict lower bound (10) 3 f is strongly convex iff , for any x , y ∈ D , and for some constant c > 0 , 1 c || y − x || 2 T f ( x )( y − x ) + f ( y ) ≥ f ( x ) + ∇ (11) 2 August 24, 2018 50 / 403

First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) multiply by theta f ( x 1 ) ≥ f ( x )+ ∇ f ( x 2 ) ≥ f ( x )+ ∇ T f ( x )( x 2 − x ) multiply by 1-theta (12) And add.. August 24, 2018 51 / 403

First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) f ( x 1 ) ≥ f ( x ) + ∇ T f ( x )( x 2 − x ) f ( x 2 ) ≥ f ( x ) + ∇ (12) Adding(1 − θ )times the second inequality to θ times the first, we get, θ f ( x 1 ) + (1 − θ ) f ( x 2 ) ≥ f ( x ) which proves that f ( x )is a convex function. In the case of strict convexity, strict inequality holds in (12) and it follows through. In the case of strong convexity, we need to additionally prove that 1 1 1 2 c || x − x || 2 = θ c || x − x || + (1 − θ ) 2 2 2 August 24, 2018 51 / 403

First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ∇ T f ( x 1 )( x 2 − x 1 ) = Directional derivative of f at x1 along x2 - x1 August 24, 2018 52 / 403

First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ( ) f x + θ ( x − x ) − f ( x ) 1 2 1 1 T ∇ f ( x )( x − x ) =lim ≤ f ( x 2 ) − f ( x 1 ) 1 2 1 θ θ → 0 This proves necessity for (9). The necessity proofs for (10) and (11) are very similar, except for a small difference for the case of strict convexity; the strict inequality is not preserved when we take limits. Suppose equality does hold in the case of strict convexity, that is for a strictly convex function f , let f ( x 2 ) = f ( x 1 ) + ∇ T f ( x 1 )( x 2 − x 1 ) (13) for some x 2 ̸ = x 1 . August 24, 2018 52 / 403

First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get August 24, 2018 53 / 403

First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) ≤ f ( x 1 + θ ( x 2 − x 1 ) ) < f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) Thus, equality can never hold in (9) for any x 1 = ̸ x 2 . This proves the which is a contradiction. necessity of (10). August 24, 2018 53 / 403

First-Order Convexity Conditions: The complete statement The geometrical interpretation of this theorem is that at any point, the linear approximation based on a local derivative gives a lower estimate of the function, i.e. the convex function always lies above the supporting hyperplane at that point. This is pictorially depictedbelow: August 24, 2018 54 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at any boundary point (x,f(x)) August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x epi(f) x [h,-1] There exist multiple supporting hyperplanes Let a supporting hyperplane be characterized by a normal vector [h(x), -1] When f was di ff erentiable, this vector was [gradient(x), -1] August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ ] [ ] [ ] } { [ v , z ]| ⟨ h ( x ) , − 1 , [ v , z ] ⟩ = ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all [ v , z ]on the hyperplane and ▶ ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )whic h also includes [y,f(y)] August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ v , z ]| ⟨ [ h ( x ) , − 1 ] , [ v , z ] ⟩ = ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ ▶ { } for all[ v , z ]on the hyperplane and ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )which also includes [ y , f ( y ) ] [ ] [ ] [ ] [ ] Thus: ⟨ h ( x ) , − 1 , y , f ( y ) ⟩ ≤ ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all y ∈ domain of f The normal to such a supporting hyperplane serves the same purpose as the [gradient(x),-1] August 24, 2018 55 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to supporting hyperplane at point (x,f(x) of epi(f) Need not be unique Gradient is a subgradient when the function is di ff erentiable August 24, 2018 56 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: (sub)Gradient necessary and su ffi cient conditions of optimality for convex functions Important for algorithms for optimization Subgradients are important for non-di ff erentiable functions and constraint optimization August 24, 2018 56 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: Ability to deal with Constraints, Optimality Conditions, Optimization Algorithms How of (sub)gradient: How to compute subgradient of complex non-di ff erentiable convex functions Calculus of convex functions and of subgradients August 24, 2018 56 / 403

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 /

classifier Sutanu Gayen Drawbacks of state-of-the art chess engines Contd.. Rule of square:

Toward an Epic Epigraph Graph Francis Bond , Graham Matthews Linguistics and Multilingual Studies,

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

BrickNet (contd) BrickNet (contd) Other Academic Projects Other Academic Projects

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

Arrays (contd) Arrays (contd) The previous example creates

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

1 Gradient descent with fixed step In this section, we discuss a gradient descent method with

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Outline Unix-style access control, contd CSci 5271 Multilevel and mandatory access control

Gradient Descent Michail Michailidis & Patrick Maiden Outline

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

J. Norem & Z. Insepov ANL/HEP CLIC 09 Oct. 15 09 We study rf gradient limits at the

Collec&ve En&ty Resolu&on in Rela&onal Data (contd)

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Clo cks [Contd.] 1 Goals of the lecture Direct dep endency clo cks Pred and

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 /

classifier Sutanu Gayen Drawbacks of state-of-the art chess engines Contd.. Rule of square:

Toward an Epic Epigraph Graph Francis Bond , Graham Matthews Linguistics and Multilingual Studies,

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

BrickNet (contd) BrickNet (contd) Other Academic Projects Other Academic Projects

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

Arrays (contd) Arrays (contd) The previous example creates

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

1 Gradient descent with fixed step In this section, we discuss a gradient descent method with

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Outline Unix-style access control, contd CSci 5271 Multilevel and mandatory access control

Gradient Descent Michail Michailidis &amp; Patrick Maiden Outline

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

J. Norem &amp; Z. Insepov ANL/HEP CLIC 09 Oct. 15 09 We study rf gradient limits at the

Collec&amp;ve En&amp;ty Resolu&amp;on in Rela&amp;onal Data (contd)

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Clo cks [Contd.] 1 Goals of the lecture Direct dep endency clo cks Pred and

Gradient Descent Michail Michailidis & Patrick Maiden Outline

J. Norem & Z. Insepov ANL/HEP CLIC 09 Oct. 15 09 We study rf gradient limits at the

Collec&ve En&ty Resolu&on in Rela&onal Data (contd)