Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 1/16
There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical differentiation 4 “Imaginary trick” in MATLAB 5 Automatic differentiation Forward mode Adjoint (or backward or reverse) mode Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 2/16
Calculating derivatives by hand Time consuming & error prone Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 3/16
Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16
Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Often this results in a very long code which is expensive to evaluate. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16
Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16
Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Problem How should we choose t ? Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16
Numerical differentiation 2/2 Problem How should we chose t ? A rule of thumb Set t = √ ǫ , where ǫ is set to machine precision or the precision of f . The accuracy of the derivative is approximately √ ǫ . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 6/16
“Imaginary trick” in MATLAB Consider an analytic function f : R n → R . Set t = 10 − 100 . ∇ f ( x ) T p = I ( f ( x + itp )) t ∇ f ( x ) T p can be calculated up to machine precision! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 7/16
Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16
Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Example f ( x 1 , x 2 , x 3 ) = sin( x 1 x 2 ) + exp( x 1 x 2 x 3 ) Evaluation code (for m = 5 elementary operations): x 1 x 2 ; sin( x 4 ); x 4 x 3 ; x 4 ← x 5 ← x 6 ← x 7 ← exp( x 6 ) x 8 ← x 5 + x 7 ; Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16
Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16
Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Forward automatic differentiation x n and (and all partial derivatives ∂φ n + i Input: ˙ x 1 , ˙ x 2 , . . . , ˙ ) ∂ x j Output: ˙ x n + m for i = 1 to m ∂φ n + i x n + i ← � n + i − 1 ˙ x j ˙ j =1 ∂ x j end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16
Automatic differentiation: reverse mode Reverse automatic differentiation Input: all ∂φ i ∂ x j Output: ¯ x 1 , . . . , ¯ x n ¯ x 1 , . . . , ¯ x n ← 0 ¯ x n + m ← 1 for j = n + m down to n + 1 for all i = 1 , 2 , . . . , j − 1 ∂φ j ¯ x i ← ¯ x i + ¯ x j ∂ x i end for end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 10/16
Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16
Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16
Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Only drawback: large memory needed for all intermediate values Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16
Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16
Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16
Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) For computation of full Jacobian J f , choice of best mode depends on size of n and m . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16
Derivation of Adjoint Mode 1/3 Regard function code as the computation of a vector which is “growing” at every iteration x 1 x 1 x 1 x 2 x 2 x 2 x 3 x 1 = ˜ = Φ 1 = x 3 x 3 . . . . . . . . . x n x n +1 x n φ n +1 ( x 1 , x 2 , x 3 , . . . , x n ) . . . x 1 x 1 x 1 x 2 x 2 x 2 x 3 x m = ˜ = Φ m = x 3 x 3 . . . . . . . . . x n + m − 1 x n + m x n + m − 1 φ n + m ( x 1 , x 2 , x 3 , . . . , x n + m − 1) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 13/16
Derivation of Adjoint Mode 2/3 Evaluation of f : R n → R q can then be written as f ( x ) = Q Φ m (Φ m − 1 ( . . . Φ 2 (Φ 1 ( x )) . . . )) with Q ∈ R q × ( n + m ) a 0-1 matrix selecting the output variables, e.g. for q = 1 � � Q = 0 0 0 1 . . . Then the full Jacobian is given by J f ( x ) = QJ Φ m (˜ x m ) J Φ m − 1 (˜ x m − 1 ) . . . J Φ 1 ( x ) where the Jacobians of Φ i are 1 0 0 0 . . . 0 1 0 . . . 0 . . . . . . . . . . . . . . . J Φ i = 0 0 0 . . . 1 ∂φ n + i ∂φ n + i ∂φ n + i ∂φ n + i . . . ∂ x 1 ∂ x 2 ∂ x 3 ∂ x n + i − 1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 14/16
Derivation of Adjoint Mode 3/3 Forward mode: J f p = QJ Φ m J Φ m − 1 . . . J Φ 1 p = Q ( J Φ m ( J Φ m − 1 . . . ( J Φ 1 p ))) Adjoint mode: p T J f p T QJ Φ m J Φ m − 1 . . . J Φ 1 = ((( p T Q ) J Φ m ) J Φ m − 1 ) . . . J Φ 1 = The adjoint mode corresponds just to the efficient evaluation of the vector matrix product p T J f ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 15/16
Software for Adjoint Derivatives Generic Tools to Differentiate Code ADOL-C for C/C++, using operator overloading (open source) ADIC / ADIFOR for C/FORTRAN, using source code transformation (open source) TAPENADE, CppAD (open source), ... Differential Algebraic Equation Solvers with Adjoints SUNDIALS Suite CVODES / IDAS (Sandia, open source) DAESOL-II (Uni Heidelberg) ACADO Integrators (Leuven, open source) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 16/16
Recommend
More recommend