Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - PowerPoint PPT Presentation

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 1/16

There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical differentiation 4 “Imaginary trick” in MATLAB 5 Automatic differentiation Forward mode Adjoint (or backward or reverse) mode Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 2/16

Calculating derivatives by hand Time consuming & error prone Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 3/16

Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Often this results in a very long code which is expensive to evaluate. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Problem How should we choose t ? Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

Numerical differentiation 2/2 Problem How should we chose t ? A rule of thumb Set t = √ ǫ , where ǫ is set to machine precision or the precision of f . The accuracy of the derivative is approximately √ ǫ . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 6/16

“Imaginary trick” in MATLAB Consider an analytic function f : R n → R . Set t = 10 − 100 . ∇ f ( x ) T p = I ( f ( x + itp )) t ∇ f ( x ) T p can be calculated up to machine precision! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 7/16

Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Example f ( x 1 , x 2 , x 3 ) = sin( x 1 x 2 ) + exp( x 1 x 2 x 3 ) Evaluation code (for m = 5 elementary operations): x 1 x 2 ; sin( x 4 ); x 4 x 3 ; x 4 ← x 5 ← x 6 ← x 7 ← exp( x 6 ) x 8 ← x 5 + x 7 ; Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Forward automatic differentiation x n and (and all partial derivatives ∂φ n + i Input: ˙ x 1 , ˙ x 2 , . . . , ˙ ) ∂ x j Output: ˙ x n + m for i = 1 to m ∂φ n + i x n + i ← � n + i − 1 ˙ x j ˙ j =1 ∂ x j end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

Automatic differentiation: reverse mode Reverse automatic differentiation Input: all ∂φ i ∂ x j Output: ¯ x 1 , . . . , ¯ x n ¯ x 1 , . . . , ¯ x n ← 0 ¯ x n + m ← 1 for j = n + m down to n + 1 for all i = 1 , 2 , . . . , j − 1 ∂φ j ¯ x i ← ¯ x i + ¯ x j ∂ x i end for end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 10/16

Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Only drawback: large memory needed for all intermediate values Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) For computation of full Jacobian J f , choice of best mode depends on size of n and m . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

Derivation of Adjoint Mode 1/3 Regard function code as the computation of a vector which is “growing” at every iteration  x 1        x 1 x 1 x 2   x 2 x 2         x 3         x 1 = ˜ = Φ 1 = x 3 x 3         . . .         . . . . . .         x n   x n +1 x n φ n +1 ( x 1 , x 2 , x 3 , . . . , x n ) . . .   x 1       x 1 x 1 x 2   x 2 x 2         x 3         x m = ˜ = Φ m = x 3 x 3         . . .         . . . . . .         x n + m − 1   x n + m x n + m − 1 φ n + m ( x 1 , x 2 , x 3 , . . . , x n + m − 1) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 13/16

Derivation of Adjoint Mode 2/3 Evaluation of f : R n → R q can then be written as f ( x ) = Q Φ m (Φ m − 1 ( . . . Φ 2 (Φ 1 ( x )) . . . )) with Q ∈ R q × ( n + m ) a 0-1 matrix selecting the output variables, e.g. for q = 1 � � Q = 0 0 0 1 . . . Then the full Jacobian is given by J f ( x ) = QJ Φ m (˜ x m ) J Φ m − 1 (˜ x m − 1 ) . . . J Φ 1 ( x ) where the Jacobians of Φ i are   1 0 0 0 . . . 0 1 0 . . . 0     . . . . . . . . . . . . . . .   J Φ i =   0 0 0 . . . 1     ∂φ n + i ∂φ n + i ∂φ n + i ∂φ n + i   . . . ∂ x 1 ∂ x 2 ∂ x 3 ∂ x n + i − 1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 14/16

Derivation of Adjoint Mode 3/3 Forward mode: J f p = QJ Φ m J Φ m − 1 . . . J Φ 1 p = Q ( J Φ m ( J Φ m − 1 . . . ( J Φ 1 p ))) Adjoint mode: p T J f p T QJ Φ m J Φ m − 1 . . . J Φ 1 = ((( p T Q ) J Φ m ) J Φ m − 1 ) . . . J Φ 1 = The adjoint mode corresponds just to the efficient evaluation of the vector matrix product p T J f ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 15/16

Software for Adjoint Derivatives Generic Tools to Differentiate Code ADOL-C for C/C++, using operator overloading (open source) ADIC / ADIFOR for C/FORTRAN, using source code transformation (open source) TAPENADE, CppAD (open source), ... Differential Algebraic Equation Solvers with Adjoints SUNDIALS Suite CVODES / IDAS (Sandia, open source) DAESOL-II (Uni Heidelberg) ACADO Integrators (Leuven, open source) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 16/16

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - PowerPoint PPT Presentation

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan 1/16 There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Adjoint Solver Workshop Why is an Adjoint Solver useful? Design and manufacture for better

Adjoint Orbits, Principal Components, and Neural Nets Some facts about Lie groups and

Extension of the adjoint method Stanislas Larnier Institut de Mathmatiques de Toulouse

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Some basic rules of differentiation R1(Constant Function Rule) The derivative of the function

Sobolev spaces Updated June 1, 2020 Plan 2 Outline: Weak derivative Relation to ordinary

Derivative Applications MAC 2233 Instantaneous Rates of Change of a Function The derivative

Hack the Derivative! Erik Taubeneck Software Engineer October 20th, 2015 American University

MAT 166 Calculus for Bus/Soc Chapter 4 Notes Techniques for Finding the Derivative

Science One Integral Calculus January 2017 Happy New Year! Differential Calculus central idea:

Keren Li Supervisor: Tanaji Sen July 15, 2001 Long range interaction without crab cavity causes

General Information: 1/7 General Information: 1/7 Course: Course: CS3911 Introduction to

Differentiable Functional Programming Atlm Gne Baydin University of Oxford

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv

LGSL: Numerical algorithms for Lua A Lua-ish interface to the GNU Scientific Library Lesley De

CS 3220: Introduction to Scientific Computing Steve Marschner Spring 2009 Monday, January 19,

Gaussian kernel regression as an improved estimator for magnetization curve and spin gap Tota

Greedy Sparse Linear Approximations of Functionals from Nodal Data R. Schaback Gttingen

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - PowerPoint PPT Presentation

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan 1/16 There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Adjoint Solver Workshop Why is an Adjoint Solver useful? Design and manufacture for better

Adjoint Orbits, Principal Components, and Neural Nets Some facts about Lie groups and

Extension of the adjoint method Stanislas Larnier Institut de Mathmatiques de Toulouse

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities &amp; Securities &amp; Derivative Derivative Litigation Repor t t Litigation Repor

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Some basic rules of differentiation R1(Constant Function Rule) The derivative of the function

Sobolev spaces Updated June 1, 2020 Plan 2 Outline: Weak derivative Relation to ordinary

Derivative Applications MAC 2233 Instantaneous Rates of Change of a Function The derivative

Hack the Derivative! Erik Taubeneck Software Engineer October 20th, 2015 American University

MAT 166 Calculus for Bus/Soc Chapter 4 Notes Techniques for Finding the Derivative

Science One Integral Calculus January 2017 Happy New Year! Differential Calculus central idea:

Keren Li Supervisor: Tanaji Sen July 15, 2001 Long range interaction without crab cavity causes

General Information: 1/7 General Information: 1/7 Course: Course: CS3911 Introduction to

Differentiable Functional Programming Atlm Gne Baydin University of Oxford

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv

LGSL: Numerical algorithms for Lua A Lua-ish interface to the GNU Scientific Library Lesley De

CS 3220: Introduction to Scientific Computing Steve Marschner Spring 2009 Monday, January 19,

Gaussian kernel regression as an improved estimator for magnetization curve and spin gap Tota

Greedy Sparse Linear Approximations of Functionals from Nodal Data R. Schaback Gttingen

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report