the tapenade ad tool
play

The TAPENADE AD Tool Laurent Hasco et, Val erie Pascual, - PowerPoint PPT Presentation

The TAPENADE AD Tool Laurent Hasco et, Val erie Pascual, Rose-Marie Greborio Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003 1 PLAN: AD: principles of Tangent and Reverse


  1. The TAPENADE AD Tool Laurent Hasco¨ et, Val´ erie Pascual, Rose-Marie Greborio Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003

  2. 1 PLAN: • AD: principles of Tangent and Reverse • Tapenade: technology from Compilation and Parallelization • Call Graphs, Flow Graphs, Symbol Tables • Static Analyses on Flow Graphs • Dependency Analysis • Tapenade: an AD tool on the web • Further Developments

  3. 2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I

  4. 2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 )

  5. 2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 )

  6. 2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) f ′ ( x ) generally too large and expensive ⇒ take useful views! y = f ′ ( x ) . ˙ ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) . ˙ x tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y reverse AD Evaluate both from right to left !

  7. 3 AD: Example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...

  8. 3 AD: Example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... The corresponding (fragment of) Jacobian is:     1 1 1 2 0     f ′ ( x ) = ...  ...     1 1        1 − p 1 ∗ v 3 p 1 0 0 1 v 2 v 2 2

  9. 4 Tangent AD keeps the structure of P : y = f ′ ( x ) . ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ ˙ 1 ( x 0 ) . ˙ x ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...

  10. 4 Tangent AD keeps the structure of P : y = f ′ ( x ) . ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ ˙ 1 ( x 0 ) . ˙ x ... v 2 = 2 ∗ ˙ ˙ v 1 v 2 = 2 ∗ v 1 + 5 v 2 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 4 = 2 ) + ˙ v 3 ∗ p 1 / v 2 ˙ ˙ v 4 = v 2 + p 1 ∗ v 3 / v 2 ... x k = f ′ just inserts the products ˙ k ( x k − 1 ) for k = 1 to p .

  11. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y

  12. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) .

  13. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) .

  14. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  15. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  16. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  17. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; ② ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✓ ✓ ✓ ✓ time retrieve ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴ y ✐ f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  18. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✴ ✓ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ✁ ☛ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  19. 5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep ② x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✓ ✴ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ❄ ✁ ☛ ✐ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep Memory usage (“Tape”) is the bottleneck!

  20. 6 AD: Continued Example Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...

  21. 6 AD: Continued Example Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... Corresponding transposed Partial Jacobians:     1 0 1 2 1 − p 1 ∗ v 3 1 0     v 2 f ′∗ ( x ) = ...  ...     2 p 1 1    1  v 2    1 0

  22. 7 AD: Reverse mode on the example ... v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...

  23. 7 AD: Reverse mode on the example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... ... v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...

  24. 7 AD: Reverse mode on the example ... Push( v 2 ) v 2 = 2 ∗ v 1 + 5 Push( v 4 ) v 4 = v 2 + p 1 ∗ v 3 / v 2 ... ... Pop( v 4 ) v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ Pop( v 2 ) v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...

  25. 8 AD: The Checkpointing tactic A Storage/Recomputation tradeoff:

  26. 8 AD: The Checkpointing tactic A Storage/Recomputation tradeoff: Tapenade does it on the Call Graph :

  27. 9 Tapenade: Internal Representation Take profit of well-known techniques from Compilation and Parallelization: • Use a general abstract Imperative Language (IL) • Represent programs as Call Graphs of Flow Graphs • Store symbol declarations in nested Symbol Tables

  28. 10

  29. 11 Application: Inversion of the Flow Graph

  30. 12 Application: Loop Inversion

  31. 13 Tapenade: Global Static Analyses on Flow Graphs • classical IN-OUT analysis. • forward dependence with respect to independent inputs. • backward influence on dependent inputs. • specific TBR analysis for the reverse mode. • . . . pointer analysis . . . Usual restrictions: conservative assumptions, arrays . . .

  32. 14 Application: reduced snapshots Snapshot = IN ( checkpoint ) � OUT ( checkpoint and after )

  33. 15 Tapenade: Using Data Dependencies flow: write x → read x anti: read x → write x output: write x → write x Data Dependencies form • a partial order between run-time instructions. • a graph between textual instructions. Any instructions shuffle that respects Data Dependencies is valid !

  34. 16 Application: Loop Fusion in “Vector” Mode ... a = 2 . 0 ∗ a + 10 . 0 b = c + sin ( a ) c = 0 . 0 ...

  35. 16 Application: Loop Fusion in “Vector” Mode ... Do n = 1 , ndt a ( n ) = 2 . 0 ∗ ˙ a ( n ) ˙ Enddo a = 2 . 0 ∗ a + 10 . 0 Do n = 1 , ndt ˙ b ( n ) = c ( n ) + cos ( a ) ∗ ˙ a ( n ) ˙ Enddo b = c + sin ( a ) Do n = 1 , ndt c ( n ) = 0 . 0 ˙ Enddo c = 0 . 0 ...

Recommend


More recommend