automatic differentiation for optimum design applied to
play

Automatic Differentiation for Optimum Design, Applied to Sonic Boom - PowerPoint PPT Presentation

Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco et, Mariano V azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003


  1. Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco¨ et, Mariano V´ azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003

  2. 1 PLAN: • Part 1: A gradient-based shape optimization to reduce the Sonic Boom • Part 2: Utilization and improvements of reverse A.D to compute the Adjoint • Conclusion

  3. 2 PART 1: GRADIENT-BASED SONIC BOOM REDUCTION • The Sonic Boom optimization problem • A mixed Adjoint/AD strategy • Resolution of the Adjoint and Gradient • Numerical results

  4. 3 The Sonic Boom Problem Control Euler Pressure Cost points ⇒ Geometry ⇒ flow ⇒ shock ⇒ function γ Ω( γ ) W ( γ ) ∇ p ( W ) j ( γ )

  5. 4 Gradient-based Optimization j ( γ ) models the strength of the downwards Sonic Boom ⇒ Compute the gradient j ′ ( γ ) and use it in an optimization loop! ⇒ Use reverse-mode AD to compute this gradient

  6. 5 Differentiate the iterative solver? W ( γ ) is defined implicitly through the Euler equations Ψ( γ, W ) = 0 ⇒ The program uses an iterative solver ⇒ Brute force reverse AD differentiates the whole iterative process • Does it make sense? • Is it efficient ?

  7. 6 A mixed Adjoint/AD strategy Back to the mathematical adjoint system:  Ψ( γ, W ) = 0 (state system)        ∂W ( γ, W ) − ( ∂ Ψ ∂J ∂W ( γ, W )) t . Π   = 0 (adjoint system)     j ′ ( γ ) = ∂J ∂γ ( γ, W ) − ( ∂ Ψ  ∂γ ( γ, W )) t . Π  = 0 (optimality condition)    lower level ⇒ reverse AD cf Part 2 upper level ⇒ hand-coded specific solver for Π

  8. 7 Upper level Solve ∂ Ψ ∂W ( γ, W )) t . Π = ∂J ∂W ( γ, W ) ⇒ Use a matrix-free solver Preconditioning: “defect correction” ⇒ use the inverse of 1 st order ∂ Ψ ∂W to precondition 2 nd order ∂ Ψ ∂W

  9. 8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions)

  10. 8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions) Performances: Assembling ∂ Ψ ∂W ( γ, W )) t . Π takes about 7 times as long as assembing the state residual Ψ( γ, W )) ⇒ Solving for Π takes about 4 times as long as solving for W .

  11. 9 Numerical results 1 Gradient j ′ ( γ ) on the skin of the plane:

  12. 10 Numerical results 2 Improvement of the sonic boom after 8 optimization cycles:

  13. 11 PART 2: REVERSE AD TO COMPUTE THE ADJOINT • Some principles of Reverse AD • The “Tape” memory problem, the “Checkpointing” tactic • Impact of Data Dependences Analysis • Impact of In-Out Data Flow Analysis

  14. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I

  15. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 )

  16. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 )

  17. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) f ′ ( x ) generally too large and expensive ⇒ take useful views! y = f ′ ( x ) . ˙ ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) . ˙ x tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y reverse AD Evaluate both from right to left !

  18. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y

  19. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) .

  20. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) .

  21. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  22. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  23. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  24. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  25. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; ② ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✓ ✓ ✓ ✓ time retrieve ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴ y ✐ f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  26. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✴ ✓ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ✁ ☛ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  27. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep ② x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✓ ✴ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ❄ ✁ ☛ ✐ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep Memory usage (“Tape”) is the bottleneck!

  28. 14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ...

  29. 14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... Corresponding transposed Partial Jacobians:     1 0 1 2 1 − p 1 ∗ v 3 1 0     v 2 f ′∗ ( x ) = ...  ...     2 p 1 1    1  v 2    1 0

  30. 15 Reverse mode on the example ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  31. 15 Reverse mode on the example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  32. 15 Reverse mode on the example ... Push( v 2 ) v 2 = 2 ∗ v 1 + 5 Push( v 4 ) v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... Pop( v 4 ) v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ Pop( v 2 ) v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  33. 16 Reverse mode on our application From subroutine Psi : Psi : γ, W �→ Ψ( γ, W ) , Use reverse AD to build subroutine Psi : Psi : γ, W, Ψ �→ ∂ Ψ ∂W ( γ, W )) t . Ψ

Recommend


More recommend