The TAPENADE AD Tool Laurent Hasco¨ et, Val´ erie Pascual, Rose-Marie Greborio Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003
1 PLAN: • AD: principles of Tangent and Reverse • Tapenade: technology from Compilation and Parallelization • Call Graphs, Flow Graphs, Symbol Tables • Static Analyses on Flow Graphs • Dependency Analysis • Tapenade: an AD tool on the web • Further Developments
2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I
2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 )
2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 )
2 AD: Principles of Tangent and Reverse AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) f ′ ( x ) generally too large and expensive ⇒ take useful views! y = f ′ ( x ) . ˙ ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) . ˙ x tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y reverse AD Evaluate both from right to left !
3 AD: Example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...
3 AD: Example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... The corresponding (fragment of) Jacobian is: 1 1 1 2 0 f ′ ( x ) = ... ... 1 1 1 − p 1 ∗ v 3 p 1 0 0 1 v 2 v 2 2
4 Tangent AD keeps the structure of P : y = f ′ ( x ) . ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ ˙ 1 ( x 0 ) . ˙ x ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...
4 Tangent AD keeps the structure of P : y = f ′ ( x ) . ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ ˙ 1 ( x 0 ) . ˙ x ... v 2 = 2 ∗ ˙ ˙ v 1 v 2 = 2 ∗ v 1 + 5 v 2 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 4 = 2 ) + ˙ v 3 ∗ p 1 / v 2 ˙ ˙ v 4 = v 2 + p 1 ∗ v 3 / v 2 ... x k = f ′ just inserts the products ˙ k ( x k − 1 ) for k = 1 to p .
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) .
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) .
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; ② ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✓ ✓ ✓ ✓ time retrieve ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴ y ✐ f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✴ ✓ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ✁ ☛ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep
5 AD: Reverse is more tricky than Tangent x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep ② x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✓ ✴ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ❄ ✁ ☛ ✐ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep Memory usage (“Tape”) is the bottleneck!
6 AD: Continued Example Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ...
6 AD: Continued Example Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... Corresponding transposed Partial Jacobians: 1 0 1 2 1 − p 1 ∗ v 3 1 0 v 2 f ′∗ ( x ) = ... ... 2 p 1 1 1 v 2 1 0
7 AD: Reverse mode on the example ... v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...
7 AD: Reverse mode on the example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 / v 2 ... ... v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...
7 AD: Reverse mode on the example ... Push( v 2 ) v 2 = 2 ∗ v 1 + 5 Push( v 4 ) v 4 = v 2 + p 1 ∗ v 3 / v 2 ... ... Pop( v 4 ) v 4 ∗ ( 1 − p 1 ∗ v 3 / v 2 v 2 = ¯ v 2 + ¯ 2 ) ¯ v 3 = ¯ v 3 + ¯ v 4 ∗ p 1 / v 2 ¯ v 4 = 0 ¯ Pop( v 2 ) v 1 = ¯ v 1 + 2 ∗ ¯ ¯ v 2 v 2 = 0 ¯ ...
8 AD: The Checkpointing tactic A Storage/Recomputation tradeoff:
8 AD: The Checkpointing tactic A Storage/Recomputation tradeoff: Tapenade does it on the Call Graph :
9 Tapenade: Internal Representation Take profit of well-known techniques from Compilation and Parallelization: • Use a general abstract Imperative Language (IL) • Represent programs as Call Graphs of Flow Graphs • Store symbol declarations in nested Symbol Tables
10
11 Application: Inversion of the Flow Graph
12 Application: Loop Inversion
13 Tapenade: Global Static Analyses on Flow Graphs • classical IN-OUT analysis. • forward dependence with respect to independent inputs. • backward influence on dependent inputs. • specific TBR analysis for the reverse mode. • . . . pointer analysis . . . Usual restrictions: conservative assumptions, arrays . . .
14 Application: reduced snapshots Snapshot = IN ( checkpoint ) � OUT ( checkpoint and after )
15 Tapenade: Using Data Dependencies flow: write x → read x anti: read x → write x output: write x → write x Data Dependencies form • a partial order between run-time instructions. • a graph between textual instructions. Any instructions shuffle that respects Data Dependencies is valid !
16 Application: Loop Fusion in “Vector” Mode ... a = 2 . 0 ∗ a + 10 . 0 b = c + sin ( a ) c = 0 . 0 ...
16 Application: Loop Fusion in “Vector” Mode ... Do n = 1 , ndt a ( n ) = 2 . 0 ∗ ˙ a ( n ) ˙ Enddo a = 2 . 0 ∗ a + 10 . 0 Do n = 1 , ndt ˙ b ( n ) = c ( n ) + cos ( a ) ∗ ˙ a ( n ) ˙ Enddo b = c + sin ( a ) Do n = 1 , ndt c ( n ) = 0 . 0 ˙ Enddo c = 0 . 0 ...
Recommend
More recommend