Automatic Differentiation-based perturbation methods for - - PowerPoint PPT Presentation
Automatic Differentiation-based perturbation methods for - - PowerPoint PPT Presentation
Automatic Differentiation-based perturbation methods for uncertainties and errors Anca Belme, Massimiliano Martinelli, Laurent Hasco et, Val erie Pascual, Alain Dervieux Tropics project http://www-sop.inria.fr/tropics/ INRIA
Overview:
- 1. Progress in Automatic Differentiation,
- 2. Numerical error reduction,
- 3. Perturbation methods for uncertainties.
- 1. Progress in Automatic Differentiation
The TAPENADE program differentiator: TAPENADE (115 000 lines of JAVA) transforms a FORTRAN program into a FORTRAN program which computes the derivative of the given FORTRAN
- program. TAPENADE is organised around an intermediate step, the analysis
and differentation of an abstract imperative language. http://www-sop.inria.fr/tropics/
Differentiation Engine Imperative Language Analyzer (IL) C parser Fortran95 parser Fortran77 parser (IL) C printer Fortran95 printer Fortran77 printer Overall Architecture of tapenade
Differentiability/Differentiation modes
Direct/Tangent mode:
- Differentiated code computes (c, δc) → j′(c).δc
- Gˆ
ateaux derivative of a functional
- Computational cost factor: αT ≈ 4
- Chain rule applies in same order as initial code
- Does not store intermediate variables
Backward/Reverse mode:
- Differentiated code computes (c, δj) → (j′(c))∗.δj
- gradient of a functional
- Computational cost factor: αR ≈ 5
- chain rule applies in reverse order to initial code.
- Needs storing intermediate variables in a first direct sweep.
Reverse differentiation strategies for iterative solvers
Find Wh such that |Ψh(Wh)| ≤ ε If the iteration for Wh is stopped by a test with ε > 0, then Wh depends on initial conditions of solution algorithm, in a similar manner to an unsteady
- system. Also Wh is only piecewise differentiable. It then is possible to apply
R-AD to the whole programme, including iterative solution algorithms. Due to dependance to intermediate states they need be stored (or recomputed). For this case and for unsteady problems, efficients strategies of storage or recomputations can be introduced by user by means of “check point” con- trols. Assuming ε = 0, state Wh does not depend on initial conditions of solution algorithm and is differentiable with respect to ch. The iteration should be pointed as “independant” or parallel in order to avoid unnecessary storage
- f intermediate values.
- 1. Progress in Automatic Differentiation
The TAPENADE program differentiator, recent advances: In its Version 3(alpha), TAPENADE takes into account user directives in- serted into sources. Directive II-LOOP allows for a more efficient reverse differentiation of par- allel loops by inhibiting storages useless in “parallel” case. Directive NOCHECKPOINT can be used for optimal checkpointing in reverse
- differentiation. TAPENADE lets the user specify finely which procedure calls
must be checkpointed or not. TAPENADE V3 has been extended to deal with Fortran95 and with ANSI
- C. TAPENADE is optimised for successive reverse differentiations.
- 2. Numerical error reduction
Abstract representation of the Partial Differential Equation: Ψ(W) = 0 . Discretisation of the PDE: Ψh(Wh) = 0 ∈ RN Wh ∈ RN , Wh = [(Wh)i] . Operators between continuous and discrete: Rh : RN → V ⊂ L2(Ω) ∩ C0(¯ Ω) vh → Rhvh Th : V → RN v → Thv Wh(x, y, z) = (RhWh)(x, y, z) ; W − Wh = small? corrector?
- 2. Numerical error reduction
A priori estimate: Ψh(W) − Ψh(Wh) = Ψh(W) ⇒ W − Wh ≈ [ ∂Ψh ∂Wh ]−1Ψh(W), in practice: δWh = Rh[ ∂Ψh ∂Wh ]−1(Ψh − ThΨ)(W(h)). A posteriori estimate: Ψ(W) − Ψ(Wh) = −Ψ(Wh) ⇒ W − Wh ≈ −[ ∂Ψ ∂W ]−1Ψ(Wh), in practice: δWh = −Rh[ ∂Ψh ∂Wh ]−1ThΨ(RhWh), quadratures formulas can be used for Ψ(RhWh).
- 2. Numerical error reduction
Goal-oriented error: j(W) = (g, W)L2(Ω) s.t. Ψ(W) = 0 ( ∂Ψ ∂W )∗p = g jh = (g, RhWh)L2(Ω) s.t. Ψh(Wh) = 0 gh = Thg ( ∂Ψh ∂Wh )∗ph = gh ⇔ ph = [ ∂Ψh ∂Wh ]−∗Th gh ph = Rhph . A fundamental assumption of the present analysis is that this discrete adjoint is a good enough approximation of continuous adjoint.
- 2. Numerical error reduction
A posteriori goal-oriented errors and correctors Field corrector δWh = −Rh[ ∂Ψh ∂Wh ]−1ThΨ(Wh) Direct-linearized goal-oriented corrector δ1j = −
g , Rh[ ∂Ψh
∂Wh ]−1ThΨ(Wh)
L2(Ω)
Adjoint-based goal-oriented correctors (Giles-Pierce) δ1j = − (ph , ThΨ(Wh))L2(Ω)
- 2. Numerical error reduction
An equivalence: Assuming that we have chosen: Th = R∗
h,
then direct-linearised and adjoint-based correctors are identical −
g , Rh[ ∂Ψh
∂Wh ]−1ThΨ(Wh)
L2(Ω)
= −
[ ∂Ψh
∂Wh ]−∗R∗
h g , ThΨ(Wh)
RN
= −
T ∗
h[ ∂Ψh
∂Wh ]−∗R∗
h g , Ψ(Wh)
L2(Ω)
- 2. Numerical error reduction
A preliminary numerical example 3D Euler flow around a wing with NACA0012 section.
- 1. Direct calculation
- 2. Direct calculation + Correction
CPU is 30% larger, error on entropy deviation is 10 times smaller.
Entropy spurious generation for a direct computation of a steady flow and for a corrected one.
- 2. Numerical error reduction by correction, conclud’d
Which A posteriori goal-oriented errors and correctors Direct-linearized goal-oriented corrector δ1j = −
g , Rh[ ∂Ψh
∂Wh ]−1ThΨ(Wh)
L2(Ω)
Adjoint-based goal-oriented correctors δ1j = − (ph , ThΨ(Wh))L2(Ω) Adjoints, as compared with linearised perturbations, are slightly more com- plex to compute for steady PDE, and much more complex for unsteady PDE’s (despite TAPENADE is well adapted for this exercice). For com- puting correctors and for several mesh adaptation applications, the “Direct- linearized goal-oriented corrector” can be used for an identical result.
- 2. A priori numerical error reduction, mesh adaption
Minimizing P1-Interpolation error
(Loseille, PhD, Paris VI, 2008) ||u − πMu||2 =
-
|∂2u
∂ξ2|.m2
ξ + |∂2u
∂η2|.m2
η
2
dxdy where ξ and η are directions of diagonalization of the Hessian of u. min
M
EM under the constraint NM = N. This can be solved analytically.
- 2. A priori numerical error reduction, mesh adaption
Optimal Metric for interpolation Mopt = C N R−1
|∂2u
∂η2| −5/6|∂2u ∂ξ2| 1/6
|∂2u
∂ξ2| −5/6|∂2u ∂η2| 1/6
R .
(1) with:C =
- |∂2u
∂ξ2|.|∂2u ∂η2|
2
6 dxdy .
But this only minimises an interpolation error...
- 2. A priori numerical error reduction, mesh adaption
Approximation a priori error (with Alauzet & Loseille)
Steady Euler equations: W ∈ V =
- H1(Ω)
5 , ∀φ ∈ V,
(Ψ(W) , φ) =
- Ω φ ∇.F(W) dΩ −
- Γ φ ˆ
F(W).n dΓ = 0 where Γ = ∂Ω and ˆ F is B.C. fluxes. Mixed-Element-Volume appoximation: ∀φh ∈ Vh,
- Ωh φh∇.ΠhF(Wh) dΩh −
- Γh φhΠh ˆ
F(Wh).n dΓh = −
- Ωh φh Dh(Wh)dΩh,
where Πh is the usual elementwise linear interpolation and where Dh holds for a numerical dissipation term.
- 2. A priori numerical error reduction, mesh adaption
A priori adjoint-based error estimate: (g, W − Wh) ≈ ((Ψh − Ψ)(W), P), with
∂Ψ
∂W
∗
P = g, the optimal mesh is obtain after some transformations by solving: Find Mopt = ArgminM
- Ω |∇P||F(W) − πMF(W)|dΩ
+
- Γ |P||( ¯
F(W) − πM ¯ F(W)).n| dΓ (2) under the constraint C(M) = N. Solved analytically as interpolation case. Remark: The adjoint-based formulation is compulsory.
Application to sonic boom : Hessian-based (Mach L2)
Goal-oriented + Hessian-based (“foot print” funct.)
Pressure track at ground
- 3. Hessian-based perturbation methods for uncertainties
A differentiation problem: Given a computer program computing a functional j(c) for c ∈ Rn j(c) = J(c, W(c)) , such that Ψ(c, W) = 0 we want, applying a source-to-source programme differentiation soft- ware, viz. TAPENADE, to get a computer program computing the sec-
- nd derivatives:
d2j dc2 = (Hii). The solution W(c) ∈ RN of state equation Ψ(c, W) = 0 is supposed here to be steady-state. It can be obtained by explicit or implicit pseudo-time advancing techniques. Application to an unsteady problem can be treated in a similar manner, but with important extra difficulties.
Why do we need Hessian?
Perturbative methods for uncertainty propagation:
- Method of Moments
µj ≃ j(µc) + 1
2
- i Hiiσ2
i
σ2
j ≃
- i G2
iσ2 i + 1 2
- i,k H2
ikσ2 i σ2 k
Robust optimization:
- Gradient-based methods for jrobust(c) = j(c) + ε||dj
dc||
- Gradient-free methods for jrobust(c) = j(c) + 1
2
- i Hiiσ2
i
- Gradient-free methods for jrobust(c) = j(c) + kσ2
j
Adjoint-corrected functionals:
- Gradient-based methods for jcorr(c) = j(c) − Ψex(c, W), Π0
Second derivative: Tangent-on-Tangent approach
d2j dcidck = −Π∗
0D2 i,kΨ + D2 i,kJ
with the adjoint Π0 and θk = dW dck solutions of: ( ∂Ψ ∂W )∗Π = ( ∂J ∂W )∗ ∂Ψ ∂W dW dck = −∂Ψ ∂c ek and: D2
i,kJ = ∂
∂c(∂J ∂c ei)ek + ∂ ∂W (∂J ∂c ei)θk + ∂ ∂W (∂J ∂c ek)θi + ∂ ∂W ( ∂J ∂W θi)θk D2
i,kΨ = ∂
∂c(∂Ψ ∂c ei)ek + ∂ ∂W (∂Ψ ∂c ei)θk + ∂ ∂W (∂Ψ ∂c ek)θi + ∂ ∂W ( ∂Ψ ∂W θi)θk
Second derivative: Tangent-on-Reverse
∂ ∂ci (∂j ∂c)∗ = (∂2j
∂c2)ei = ∂ ∂c(∂J ∂c)∗ei + ∂ ∂W (∂J ∂c)∗θi +
− ∂
∂c[(∂Ψ ∂c )∗Π0]ei − ∂ ∂W [(∂Ψ ∂c )∗Π0]θi − (∂Ψ ∂c )∗λi
with Π0 solution of the adjoint system ( ∂Ψ ∂W )∗Π = ( ∂J ∂W )∗ θi and λi solution of the systems ∂Ψ ∂W θi = −∂Ψ ∂c ei ( ∂Ψ ∂W )∗λi = ∂ ∂c( ∂J ∂W )∗ei + ∂ ∂W ( ∂J ∂W )∗θi − ∂ ∂c[( ∂Ψ ∂W )∗Π0]ei − ∂ ∂W [( ∂Ψ ∂W )∗Π0]θi
Full Hessian: ToT vs. ToR
Common part:
- Adjoint state ( ∂Ψ
∂W )∗Π = ( ∂J ∂W )∗
- n linear system to solve: ∂Ψ
∂W dW dci = −∂Ψ ∂c ei Specific parts:
- We assume unitary cost for a single residual evaluation Ψ(c, W)
- ToT : computation of D2
ikΨ and D2 ikJ =
⇒ n(n + 1) 2 α2
T
(1 < αT < 4)
- ToR : n (adjoint) linear system to solve: ( ∂Ψ
∂W )∗λi = ˙ ¯ JW − ˙ ¯ ΨW = ⇒ n(niter + αT)αR (1 < αR ≤ 5) Use ToT when n < 2αR(niter + αT) α2
T
Application to reduced order modeling of 3D-Euler flow
Functional to model:
- Inputs: Mach number and angle of attack
- Output: drag coefficient.
Numerical basis: Upwind vertex-centered finite-volume, implicit pseudo-time advancing with first-order Jacobian
Building the reduced quadratic model
0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.78 0.8 0.82 0.84 0.86 0.88 1.6 1.8 2 2.2 2.4 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 Drag
Nonlinear simulations
Mach Angle of attack Drag 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.78 0.8 0.82 0.84 0.86 0.88 1.6 1.8 2 2.2 2.4 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 Drag
Taylor 1st order (α=2.0, M=0.83)
Mach Angle of attack Drag 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.78 0.8 0.82 0.84 0.86 0.88 1.6 1.8 2 2.2 2.4 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 Drag
Taylor 2nd order (α=2.0, M=0.83)
Mach Angle of attack Drag
- 3
- 2
- 1
1 2 3 0.78 0.8 0.82 0.84 0.86 0.88 1.6 1.8 2 2.2 2.4
- 3
- 2
- 1
1 2 3 Relative difference (%)
Relative Difference: Nonlinear vs. Taylor 2nd order
Mach Angle of attack Relative difference (%)
Comparing the reduced quadratic model with metamodels (∗)
Metamodel Points Expectation Relative error Variance Relative error RBF 8 6.855 10−3 0.029% 1.562 10−7 0.579% KRG 8 6.853 10−3 0.058% 1.551 10−7 0.128% First-order 6.830 10−3 0.393% 1.547 10−7 0.386% Second-order 6.860 10−3 0.043% 1.558 10−7 0.321% Memory in Mb Flow solver first-order Jacobian + Block-diagonal Precond. + work vectors 86+10+35 First derivatives ILU(1) Precond. + GMRES(200) + differentiated vectors 170+250+120 Second derivatives ILU(1) Precond. + GMRES(200) + differentiated vectors 170+250+240 CPU time in second CPU time in second Flow solver 403 403*8 Gradient 255 Hessian 278 Total 936 3224 (∗) Martinelli-Duvigneau, submitted in a journal.
Concluding remarks
Efficiency oriented AD application
- First and second derivative architecture well identified.
- Optimum use of modern iterative solvers.
- Use reverse mod whenever necessary
- When using reverse mode a better efficiency is obtained by using new
TAPENADE functionalities. Current and future works
- AD current studies concerning MPI efficiency.
- Mesh adaptation, correction and accuracy control for large instationary