Automatic Differentiation for Computational Engineering Kailai Xu - PowerPoint PPT Presentation

Automatic Differentiation for Computational Engineering Kailai Xu and Eric Darve CME 216 AD 1 / 47

Outline Overview 1 Computational Graph 2 Forward Mode 3 Reverse Mode 4 AD for Physical Simulation 5 AD Through Implicit Operators 6 Conclusion 7 CME 216 AD 2 / 47

Overview Gradients are useful in many applications Mathematical Optimization x ∈ R n f ( x ) min Using the gradient descent method: x n +1 = x n − α n ∇ f ( x n ) Sensitivity Analysis f ( x + ∆ x ) ≈ f ′ ( x )∆ x Machine Learning Training a neural network using automatic differentiation (back-propagation). Solving Nonlinear Equations Solve a nonlinear equation f ( x ) = 0 using Newton’s method x n +1 = x n − f ( x n ) f ′ ( x n ) CME 216 AD 3 / 47

Terminology Deriving and implementing gradients are a challenging and all-consuming process. Automatic differentiation: a set of techniques to numerically evaluate the derivative of a function specified by a computer program (Wikipedia). It also bears other names such as autodiff, algorithmic differentiation, computational differentiation, and back-propagation. There are a lot of AD softwares TensorFlow and PyTorch: deep learning frameworks in Python 1 Adept-2: combined array and automatic differentiation library in C++ 2 autograd: efficiently derivatives computation of NumPy code. 3 ForwardDiff.jl, Zygote.jl: Julia differentiable programming packages 4 This lecture: how to compute gradients using automatic differentiation (AD) Forward mode, reverse mode, and AD for implicit solvers CME 216 AD 4 / 47

AD Software https://github.com/microsoft/ADBench CME 216 AD 5 / 47

Finite Differences f ′ ( x ) ≈ f ( x + h ) − f ( x ) f ′ ( x ) ≈ f ( x + h ) − f ( x − h ) , 2 h h Derived from the definition of derivatives f ( x + h ) − f ( x ) f ′ ( x ) = lim h h → 0 Conceptually simple. Curse of dimensionalties: to compute the gradients of f : R m → R , you need at least O ( m ) function evaluations. Huge numerical error: roundoff error. CME 216 AD 6 / 47

Finite Difference f ′ ( x ) = cos( x ) f ( x ) = sin( x ) x 0 = 0 . 1 CME 216 AD 7 / 47

Finite Difference Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2017). Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research, 18(1), 5595-5637. CME 216 AD 8 / 47

Symbolic Differentiation Symbolic differentiation computes exact derivatives (gradients): there is no approximation error. It works by recursively applies simple rules to symbols d d dx ( c ) = 0 dx ( x ) = 1 dx ( u + v ) = d d dx ( u ) + d dx ( uv ) = v d d dx ( u ) + u d dx ( v ) dx ( v ) . . . Here c is a variable independent of x , and u , v are variables dependent on x . There may not exist convenient expressions for the analytical gradients of some functions. For example, a blackbox function from a third-party library. CME 216 AD 9 / 47

Symbolic Differentiation Symbolic differentiation can lead to complex and redundant expressions CME 216 AD 10 / 47

Automatic Differentiation AD is neither finite difference nor symbolic differentiation. It works by recursively applies simple rules to values d d dx ( c ) = 0 dx ( x ) = 1 dx ( u + v ) = d d dx ( u ) + d dx ( uv ) = v d d dx ( u ) + u d dx ( v ) dx ( v ) . . . Here c is a variable independent of x , and u , v are variables dependent on x . It evaluates numerically gradients of “function units” using symbolic differentiation, and chains the computed gradients using the chain rule df ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) dx It is efficient (linear in the cost of computing the function itself) and numerically stable. CME 216 AD 11 / 47

Computational Graph The “language” for automatic differentiation is computational graph. The computational graph is a directed acyclic graph (DAG). Each edge represents the data: a scalar, a vector, a matrix, or a high dimensional tensor. Each node is a function that consumes several incoming edges and outputs some values. J f 4 J = f 4 (u 1 , u 2 , u 3 , u 4 ) , u 4 u 2 = f 1 (u 1 , θ ) , u 3 u 1 u 2 u 3 = f 2 (u 2 , θ ) , u 1 f 1 f 2 f 3 u 4 = f 3 (u 3 , θ ) . θ Let’s build a computational graph for computing z = sin( x 1 + x 2 ) + x 2 2 x 3 CME 216 AD 13 / 47

Building a Computational Graph z = sin( x 1 + x 2 ) + x 2 2 x 3 CME 216 AD 14 / 47

Computing Gradients from a Computational Graph Automatic differentiation works by propagating gradients in the computational graph. Two basic modes: forward-mode and backward-mode. Forward-mode propagates gradients in the same direction as forward computation. Backward-mode propagates gradients in the reverse direction of forward computation. CME 216 AD 17 / 47

Computing Gradients from a Computational Graph Different computational graph topologies call for different modes of automatic differentiation. One-to-many: forward-propagation ⇒ forward-mode AD. Many-to-one: back-propagation ⇒ reverse-mode AD. CME 216 AD 18 / 47

Automatic Differentiation: Forward Mode AD The forward-mode automatic differentiation uses the chain rule to propagate the gradients. ∂ f ◦ g ( x ) = f ′ ( g ( x )) g ′ ( x ) ∂ x Compute in the same order as function evaluation. Each node in the computational graph Aggregate all the gradients from up-streams. Forward the gradient to down-stream nodes. CME 216 AD 20 / 47

Example: Forward Mode AD Let’s consider a specific way for computing x 4   x 2 + sin( x ) f ( x ) =   − sin( x ) CME 216 AD 21 / 47

Example: Forward Mode AD Let’s consider a specific way for computing x 4   x 2 + sin( x ) f ( x ) =   − sin( x ) 1 ) = ( x 2 , 2 x ) ( y 1 , y ′ y 3 = y 2 1 y 4 = y 1 + y 2 y 5 = − y 2 ( y 2 , y ′ 2 ) = (sin x , cos x ) y 1 = x 2 y 2 = sin x x CME 216 AD 21 / 47

Example: Forward Mode AD Let’s consider a specific way for computing x 4   x 2 + sin( x ) f ( x ) =   − sin( x ) 1 ) = ( x 2 , 2 x ) ( y 1 , y ′ y 3 = y 2 1 y 4 = y 1 + y 2 y 5 = − y 2 ( y 2 , y ′ 2 ) = (sin x , cos x ) 3 ) = ( y 2 1 ) = ( x 4 , 4 x 3 ) ( y 3 , y ′ 1 , 2 y 1 y ′ y 1 = x 2 y 2 = sin x x CME 216 AD 21 / 47

Example: Forward Mode AD Let’s consider a specific way for computing x 4   x 2 + sin( x ) f ( x ) =   − sin( x ) 1 ) = ( x 2 , 2 x ) ( y 1 , y ′ y 3 = y 2 1 y 4 = y 1 + y 2 y 5 = − y 2 ( y 2 , y ′ 2 ) = (sin x , cos x ) 3 ) = ( y 2 1 ) = ( x 4 , 4 x 3 ) ( y 3 , y ′ 1 , 2 y 1 y ′ y 1 = x 2 y 2 = sin x ( y 4 , y ′ 4 ) = ( y 1 + y 1 , y ′ 1 + y ′ 2 ) = ( x 2 + sin x , 2 x + cos x ) x CME 216 AD 21 / 47

Example: Forward Mode AD Let’s consider a specific way for computing x 4   x 2 + sin( x ) f ( x ) =   − sin( x ) 1 ) = ( x 2 , 2 x ) ( y 1 , y ′ y 3 = y 2 1 y 4 = y 1 + y 2 y 5 = − y 2 ( y 2 , y ′ 2 ) = (sin x , cos x ) 3 ) = ( y 2 1 ) = ( x 4 , 4 x 3 ) ( y 3 , y ′ 1 , 2 y 1 y ′ y 1 = x 2 y 2 = sin x ( y 4 , y ′ 4 ) = ( y 1 + y 1 , y ′ 1 + y ′ 2 ) = ( x 2 + sin x , 2 x + cos x ) x ( y 5 , y ′ 5 ) = ( − y 2 , − y ′ 2 ) = ( − sin x , − cos x ) CME 216 AD 21 / 47

Summary Forward mode AD reuses gradients from upstreams. Therefore, this mode is useful for few-to-many mappings f : R n → R m , n ≪ m Applications: sensitivity analysis, uncertainty quantification, etc. Consider a physical model f : R n → R m , let x ∈ R n be the quantity of interest (usually a low dimensional physical parameter), uncertainty propagation method computes the perturbation of the model output (usually a large dimensional quantity, i.e., m ≫ 1) f ( x + ∆ x ) ≈ f ( x ) + f ′ ( x )∆ x CME 216 AD 22 / 47

Reverse Mode AD df ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) dx Computing in the reverse order of forward computation. Each node in the computational graph Aggregates all the gradients from down-streams Back-propagates the gradient to upstream nodes. CME 216 AD 24 / 47

Example: Reverse Mode AD z = sin( x 1 + x 2 ) + x 2 2 x 3 CME 216 AD 25 / 47

Summary Reverse mode AD reuses gradients from down-streams. Therefore, this mode is useful for many-to-few mappings f : R n → R m , n ≫ m Typical application: Deep learning: n = total number of weights and biases of the neural network, m = 1 (loss function). Mathematical optimization: usually there are only a single objective function. CME 216 AD 29 / 47

Automatic Differentiation for Computational Engineering Kailai Xu - PowerPoint PPT Presentation

Automatic Differentiation for Computational Engineering Kailai Xu and Eric Darve CME 216 AD 1 / 47 Outline Overview 1 Computational Graph 2 Forward Mode 3 Reverse Mode 4 AD for Physical Simulation 5 AD Through Implicit Operators 6

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Differentiation Tools for FreeFem++ Workshop FreeFem++ Sylvain Auliac (

CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10:

CSC421/2516 Lecture 6: Automatic Differentiation Roger Grosse and Jimmy Ba Roger Grosse and

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures Jan H uckelheim,

A Simply Typed -Calculus of Forward Automatic Differentiation Oleksandr Manzyuk National

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot Kukreja 1 Jan H December

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba

Automatic Differentiation by Program Transformation Laurent Hasco et INRIA Sophia-Antipolis,

c cientific omputing Automatic Differentiation of Computational Fluid Dynamics Package DROPS

Creating Meaningful Difference -a workshop about successful differentiation Workshop Team

Cheating Platform on Android Milan Gabor & Danijel Grah / W hoAreW e > Just two guys

Mobile Security Srdjan Matic < srdjan@security.di.unimi.it > Aristide Fattori <

Android Multilib Build Cheat Sheet Presented by Amit Pundir twitter: pundiramit irc: pundir at

Are Humans Ready? Andrew Tsonchev, Director of Technology, Darktrace Industrial YOUR COMPUTER HAS

How applications are run on Android ? Jean-Loup Bogalho & Jrmy Lefaure

Intro to Multithreading Ryan Eberhardt and Armin Namavari May 5, 2020 Class logistics Fill out

Parameter Estimation Smoothing p(x 1 = h , x 2 = o , x 3 = r , x 4 = s , x 5 = e , x 6 = s , )

Lifecycle Marketing -- Starting at the Zygote Stage Rachel Fishman Feddersen Heather Vessey

Automatic Differentiation for Computational Engineering Kailai Xu - PowerPoint PPT Presentation

Automatic Differentiation for Computational Engineering Kailai Xu and Eric Darve CME 216 AD 1 / 47 Outline Overview 1 Computational Graph 2 Forward Mode 3 Reverse Mode 4 AD for Physical Simulation 5 AD Through Implicit Operators 6

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &amp;

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Differentiation Tools for FreeFem++ Workshop FreeFem++ Sylvain Auliac (

CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10:

CSC421/2516 Lecture 6: Automatic Differentiation Roger Grosse and Jimmy Ba Roger Grosse and

Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures Jan H uckelheim,

A Simply Typed -Calculus of Forward Automatic Differentiation Oleksandr Manzyuk National

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot Kukreja 1 Jan H December

CSC421/2516 Lecture 3: Automatic Differentiation &amp; Distributed Representations Jimmy Ba

Automatic Differentiation by Program Transformation Laurent Hasco et INRIA Sophia-Antipolis,

c cientific omputing Automatic Differentiation of Computational Fluid Dynamics Package DROPS

Creating Meaningful Difference -a workshop about successful differentiation Workshop Team

Cheating Platform on Android Milan Gabor &amp; Danijel Grah / W hoAreW e &gt; Just two guys

Mobile Security Srdjan Matic &lt; srdjan@security.di.unimi.it &gt; Aristide Fattori &lt;

Android Multilib Build Cheat Sheet Presented by Amit Pundir twitter: pundiramit irc: pundir at

Are Humans Ready? Andrew Tsonchev, Director of Technology, Darktrace Industrial YOUR COMPUTER HAS

How applications are run on Android ? Jean-Loup Bogalho &amp; Jrmy Lefaure

Intro to Multithreading Ryan Eberhardt and Armin Namavari May 5, 2020 Class logistics Fill out

Parameter Estimation Smoothing p(x 1 = h , x 2 = o , x 3 = r , x 4 = s , x 5 = e , x 6 = s , )

Lifecycle Marketing -- Starting at the Zygote Stage Rachel Fishman Feddersen Heather Vessey

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba

Cheating Platform on Android Milan Gabor & Danijel Grah / W hoAreW e > Just two guys

Mobile Security Srdjan Matic < srdjan@security.di.unimi.it > Aristide Fattori <

How applications are run on Android ? Jean-Loup Bogalho & Jrmy Lefaure