Elements of differential calculus and optimization. Joan Alexis - PowerPoint PPT Presentation

Elements of differential calculus and optimization. Joan Alexis Glaun` es October 24, 2019 1/29

Differential Calculus in R n partial derivatives Partial derivatives of a real-valued function defined on R n : f : R n → R . ◮ example : f : R 2 → R , � ∂ f ∂ x 1 ( x 1 , x 2 ) = 4( x 1 − 1) + x 2 f ( x 1 , x 2 ) = 2( x 1 − 1) 2 + x 1 x 2 + x 2 ⇒ 2 ∂ f ∂ x 2 ( x 1 , x 2 ) = x 1 + 2 x 2 ◮ example : f : R n → R , f ( x ) = f ( x 1 , . . . , x n ) = ( x 2 − x 1 ) 2 + ( x 3 − x 2 ) 2 + · · · + ( x n − x n − 1 ) 2 ∂ f  ∂ x 1 ( x ) = 2( x 1 − x 2 )   ∂ f ∂ x 2 ( x ) = 2( x 2 − x 1 ) + 2( x 2 − x 3 )     ∂ f  ∂ x 3 ( x ) = 2( x 3 − x 2 ) + 2( x 3 − x 4 )  ⇒ · · ·   ∂ f  ∂ x n − 1 ( x ) = 2( x n − 1 − x n − 2 ) + 2( x n − 1 − x n )     ∂ f ∂ x n ( x ) = 2( x n − x n − 1 )  2/29

Differential Calculus in R n Directional derivatives ◮ Let x , h ∈ R n . We can look at the derivative of f at x in the direction h . It is defined as f ( x + ε h ) − f ( x ) f ′ h ( x ) := lim , ε ε → 0 i.e. f h ( x ) = g ′ (0) where g ( ε ) = f ( x + ε h ) (the restriction of f along the line passing through x with direction h . ◮ The partial derivatives are in fact the directional derivatives in the directions of the canonical basis e i = (0 , . . . , 1 , 0 , . . . , 0) : ∂ f = f ′ e i ( x ) . ∂ x i 3/29

Differential Calculus in R n Differential form and Jacobian matrix ◮ The application that maps any direction h to f ′ h ( x ) is a linear map from R n to R . It is called the differential form of f at x , and denoted f ′ ( x ) or Df ( x ). Its matrix in the canonical basis is called the Jacobian matrix at x . It is a 1 × n matrix whose coefficients are simply the partial derivatives : � ∂ f ( x ) , . . . , ∂ f � Jf ( x ) = ( x ) . ∂ x 1 ∂ x n ◮ Hence one gets the expression of the directional derivative in any direction h = ( h 1 , . . . , h n ) by multiplying this Jacobian matrix with the column vector of the h i : Jf ( x ) × h = ∂ f ( x ) h 1 + · · · + ∂ f f ′ h ( x ) = f ′ ( x ) . h = ( x ) h n (1) ∂ x 1 ∂ x n n ∂ f � = ( x ) h i . (2) ∂ x i i =1 4/29

Differential Calculus in R n Differential form and Jacobian matrix ◮ More generally, if f : R n → R p , f = ( f 1 , . . . , f p ) one defines the differential of f , f ′ ( x ) or Df ( x ) as the linear map from R n to R p whose matrix in the canonical basis is  ∂ f 1 ∂ f 1  ∂ x 1 ( x ) · · · ∂ x n ( x ) Jf ( x ) = · · · · · · · · ·     ∂ f p ∂ f p ∂ x 1 ( x ) · · · ∂ x n ( x ) 5/29

Differential Calculus in R n Differential form and Jacobian matrix Some rule of differentiation ◮ linearity: if f ( x ) = au ( x ) + bv ( x ), with u and v two functions and a , b two real numbers, then f ′ ( x ) . h = au ′ ( x ) . h + bv ′ ( x ) . h . ◮ The chain rule: if f : R n → R is a composition of two functions v : R n → R p and u : R p → R : f ( x ) = u ( v ( x )), then one has f ′ ( x ) . h = ( u ◦ v ) ′ ( x ) . h = u ′ ( v ( x )) . v ′ ( x ) . h 6/29

Differential Calculus in R n Gradient ◮ If f : R n → R , the matrix multiplication Jf ( x ) × h can be viewed also as a scalar product between the vector h and the vector of partial derivatives. We call this vector of partial derivatives the gradient of f at x , denoted ∇ f ( x ). n ∂ f f ′ ( x ) . h = � ( x ) h i = �∇ f ( x ) , h � . ∂ x i i =1 ◮ Hence we get three different equivalent ways for computing a derivative of a function : either as a directional derivative, or using the differential form notation, or using the partial derivatives. 7/29

Differential Calculus in R n Example i =1 ( x i +1 − x i ) 2 : Example with f ( x ) = � n − 1 ◮ Using directional derivatives : we write n − 1 � ( x i +1 − x i + ε ( h i +1 − h i )) 2 g ( ε ) = f ( x + ε h ) = i =1 n − 1 g ′ ( ε ) = 2 � ( x i +1 − x i + ε ( h i +1 − h i )) ( h i +1 − h i ) i =1 n − 1 � f ′ ( x ) . h = g ′ (0) = 2 ( x i +1 − x i ) ( h i +1 − h i ) i =1 8/29

Differential Calculus in R n Example ◮ Using differential forms : we write n − 1 � ( x i +1 − x i ) 2 f ( x ) = i =1 n − 1 f ′ ( x ) = 2 � ( x i +1 − x i ) ( dx i +1 − dx i ) i =1 where dx i denotes the differential form of the coordinate function x �→ x i which is simply dx i . h = h i . ◮ Applying this differential form to a vector h we retrieve n − 1 � f ′ ( x ) . h = 2 ( x i +1 − x i ) ( h i +1 − h i ) i =1 9/29

Differential Calculus in R n Example ◮ Using partial derivatives : we write n ∂ f f ′ ( x ) . h = f ′ � h ( x ) = ( x ) h i ∂ x i i =1 = 2( x 1 − x 2 ) h 1 + (2( x 2 − x 1 ) + 2( x 2 − x 3 )) h 2 + . . . + 2( x n − x n − 1 ) h n Arranging terms differently we get finally the same formula: n − 1 f ′ ( x ) . h = 2 � ( x i +1 − x i ) ( h i +1 − h i ) i =1 ◮ This calculus is less straightforward because we first identified terms corresponding to each h i to compute the partial derivatives, and then grouped terms back to the original summation. 10/29

Differential Calculus in R n Example Corresponding Matlab codes : these two codes compute the gradient of f (they give exactly the same result) : ◮ Code that follows the partial derivatives calculus : we compute the partial derivative ∂ f ∂ x i ( x ) for each i and put it in the coefficient i of the gradient. f u n c t i o n G = g r a d i e n t f ( x ) n = length ( x ) ; G = ze ros (n , 1 ) ; G(1) = 2 ∗ ( x(1) − x ( 2 ) ) ; f o r i =2:n − 1 G( i ) = 2 ∗ ( x ( i ) − x ( i − 1)) + 2 ∗ ( x ( i ) − x ( i +1)); end G(n) = 2 ∗ ( x (n) − x (n − 1)); end 11/29

Differential Calculus in R n Example ◮ Code that follows the differential form calculus : we compute coefficients appearing in the summation and incrementally fill the corresponding coefficients of the gradient f u n c t i o n G = g r a d i e n t f ( x ) n = length ( x ) ; G = ze ros (n , 1 ) ; f o r i =1:n − 1 c = 2 ∗ ( x ( i +1) − x ( i ) ) ; G( i +1) = G( i +1) + c ; G( i ) = G( i ) − c ; end end ◮ This second code is better because it only requires the differential form, and also because it is faster : at each step in the loop, only one coefficient 2( x i +1 − x i ) is computed instead of two. 12/29

Gradient descent Gradient descent algorithm ◮ Let f : R n → R be a function. The gradient of f gives the direction in which the function increases the most. Conversely the opposite of the gradient gives the direction in which the function decreases the most. ◮ Hence the idea of gradient descent is to start from a given vector x 0 = ( x 0 n ), move from x 0 with a small step in the direction 1 , x 0 2 , . . . , x 0 −∇ f ( x 0 ), recompute the gradient at the new position x 1 and move again in the −∇ f ( x 1 ) direction, and repeat this process a large number of times to finally get to the position for which f has a minimal value. ◮ Gradient descent algorithm : choose initial position x 0 ∈ R n and stepsize η > 0, and compute iteratively the sequence x k +1 = x k − η ∇ f ( x k ) . ◮ The convergence of the sequence to a minimizer of the function depends on properties of the function and the choice of η (see later). 13/29

Gradient descent Gradient descent algorithm 14/29

Taylor expansion First order Taylor expansion of a function ◮ Let f : R n → R . The first-order Taylor expansion at point x ∈ R d writes f ( x + h ) = f ( x ) + � h , ∇ f ( x ) � + o ( � h � ) , or equivalently n ∂ f � f ( x + h ) = f ( x ) + h i ( x ) + o ( � h � ) . ∂ x i i =1 ◮ This means f is approximated by a linear map locally around point x . 15/29

Taylor expansion Hessian and second-order Taylor expansion ◮ The Hessian matrix of a function f is the matrix of second-order partial derivatives : ∂ 2 f ∂ 2 f   1 ( x ) · · · ∂ x 1 ∂ x n ( x ) ∂ x 2 . .   . . Hf ( x ) = . .     ∂ 2 f ∂ 2 f ∂ x 1 ∂ x n ( x ) · · · n ( x ) ∂ x 2 ◮ The second-order Taylor expansion writes f ( x + h ) = f ( x ) + � h , ∇ f ( x ) � + 1 2 h T Hf ( x ) h + o ( � h � 2 ) , where h is taken as a column vector and h T is its transpose (row vector). ◮ Developing this formula gives n n n ∂ 2 f ∂ f ( x ) + 1 � � � ( x ) + o ( � h � 2 ) . f ( x + h ) = f ( x ) + h i h i h j ∂ x i 2 ∂ x i ∂ x j i =1 i =1 j =1 16/29

Taylor expansion Taylor expansion 17/29

Optimality conditions 1st order optimality condition ◮ If x is a local minimizer of f , i.e. f ( x ) ≤ f ( y ) for any y in a small neighbourhood of x , then ∇ f ( x ) = 0 . ◮ A point x that satisfies ∇ f ( x ) = 0 is called a critical point. So every local minimizer is a critical point, but the converse is false. ◮ In fact we distinguish three types of critical points: local minimizers, local maximizers, and saddle points (saddle points are just critical points that are neither local minimizers or maximizers). ◮ Generally the analysis of the hessian matrix allows to distinguish between these three types (see next slide) 18/29

Elements of differential calculus and optimization. Joan Alexis - PowerPoint PPT Presentation

Elements of differential calculus and optimization. Joan Alexis Glaun` es October 24, 2019 1/29 Differential Calculus in R n partial derivatives Partial derivatives of a real-valued function defined on R n : f : R n R . example : f : R

Matrix differential calculus 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Matrix

Lagrange Multipliers Math 115 Calculus 115 How to deal with constrained optimization. Calculus

Whats Calculus? Answer: Next semester! (Fundamental Theorem of Calculus, by Newton and

Bounded H -calculus for Closed Extensions of Cone Differential Operators J org Seiler

Science One Integral Calculus January 2017 Happy New Year! Differential Calculus central idea:

A Uniform Substitution Calculus for Differential Dynamic Logic Andr e Platzer

Contents 1 The homotopy elements h 0 h n 2 Toda differential 3 Method of infinite descent 4 Further

It os calculus in physics and stochastic partial differential equations Josselin Garnier

Analytic pseudo-differential calculus via the Bargmann transform Joachim Toft Linnus

MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca

Performance Optimization of a Differential Method for Localization of Capsule Endoscopes S.

ENERGY OPTIMIZATION March 20, 2019 PRESENTATION OBJECTIVES Provide summary of elements

Construction of Hadamard states by pseudo-differential calculus Christian G erard joint work

Partial Differential Equations Approaches to Optimization and Regularization of Deep Neural

It is not a coincidence! On patterns in some Calculus optimization problems. Maria Nogin

Differential Vector Calculus Steve Rotenberg CSE169: Computer Animation UCSD Winter 2020

PyZX: Quantum circuit optimization using the ZX-calculus Aleks Kissinger aleks@cs.ru.nl John

Violations by Sampling and Optimization Dana Benjamin Bichsel Timon Gehr PetarTsankov Martin

PyZX: Quantum circuit optimization using the ZX-calculus Aleks Kissinger aleks@cs.ru.nl John

propositions to elements of B E.g.: A B = A B x

Solving mixed-integer partial differential equation constrained optimization Sven Leyffer 1 ,

Mathematics 3 3-1a Differential Calculus A derivative function defines the slope described by

Calculus 1120, Spring 2012 Dan Barbasch October 18 Dan Barbasch () Calculus 1120, Spring 2012

Calculus 1120, Spring 2012 Dan Barbasch October 16 Dan Barbasch Calculus 1120, Spring 2012