Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco¨ et, Mariano V´ azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003
1 PLAN: • Part 1: A gradient-based shape optimization to reduce the Sonic Boom • Part 2: Utilization and improvements of reverse A.D to compute the Adjoint • Conclusion
2 PART 1: GRADIENT-BASED SONIC BOOM REDUCTION • The Sonic Boom optimization problem • A mixed Adjoint/AD strategy • Resolution of the Adjoint and Gradient • Numerical results
3 The Sonic Boom Problem Control Euler Pressure Cost points ⇒ Geometry ⇒ flow ⇒ shock ⇒ function γ Ω( γ ) W ( γ ) ∇ p ( W ) j ( γ )
4 Gradient-based Optimization j ( γ ) models the strength of the downwards Sonic Boom ⇒ Compute the gradient j ′ ( γ ) and use it in an optimization loop! ⇒ Use reverse-mode AD to compute this gradient
5 Differentiate the iterative solver? W ( γ ) is defined implicitly through the Euler equations Ψ( γ, W ) = 0 ⇒ The program uses an iterative solver ⇒ Brute force reverse AD differentiates the whole iterative process • Does it make sense? • Is it efficient ?
6 A mixed Adjoint/AD strategy Back to the mathematical adjoint system: Ψ( γ, W ) = 0 (state system) ∂W ( γ, W ) − ( ∂ Ψ ∂J ∂W ( γ, W )) t . Π = 0 (adjoint system) j ′ ( γ ) = ∂J ∂γ ( γ, W ) − ( ∂ Ψ ∂γ ( γ, W )) t . Π = 0 (optimality condition) lower level ⇒ reverse AD cf Part 2 upper level ⇒ hand-coded specific solver for Π
7 Upper level Solve ∂ Ψ ∂W ( γ, W )) t . Π = ∂J ∂W ( γ, W ) ⇒ Use a matrix-free solver Preconditioning: “defect correction” ⇒ use the inverse of 1 st order ∂ Ψ ∂W to precondition 2 nd order ∂ Ψ ∂W
8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions)
8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions) Performances: Assembling ∂ Ψ ∂W ( γ, W )) t . Π takes about 7 times as long as assembing the state residual Ψ( γ, W )) ⇒ Solving for Π takes about 4 times as long as solving for W .
9 Numerical results 1 Gradient j ′ ( γ ) on the skin of the plane:
10 Numerical results 2 Improvement of the sonic boom after 8 optimization cycles:
11 PART 2: REVERSE AD TO COMPUTE THE ADJOINT • Some principles of Reverse AD • The “Tape” memory problem, the “Checkpointing” tactic • Impact of Data Dependences Analysis • Impact of In-Out Data Flow Analysis
12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I
12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 )
12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 )
12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) f ′ ( x ) generally too large and expensive ⇒ take useful views! y = f ′ ( x ) . ˙ ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) . ˙ x tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y reverse AD Evaluate both from right to left !
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) .
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) .
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; ② ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✓ ✓ ✓ ✓ time retrieve ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴ y ✐ f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✴ ✓ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ✁ ☛ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep
13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep ② x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✓ ✴ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ❄ ✁ ☛ ✐ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep Memory usage (“Tape”) is the bottleneck!
14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ...
14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... Corresponding transposed Partial Jacobians: 1 0 1 2 1 − p 1 ∗ v 3 1 0 v 2 f ′∗ ( x ) = ... ... 2 p 1 1 1 v 2 1 0
15 Reverse mode on the example ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...
15 Reverse mode on the example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...
15 Reverse mode on the example ... Push( v 2 ) v 2 = 2 ∗ v 1 + 5 Push( v 4 ) v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... Pop( v 4 ) v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ Pop( v 2 ) v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...
16 Reverse mode on our application From subroutine Psi : Psi : γ, W �→ Ψ( γ, W ) , Use reverse AD to build subroutine Psi : Psi : γ, W, Ψ �→ ∂ Ψ ∂W ( γ, W )) t . Ψ
Recommend
More recommend