. Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco¨ et Project TROPICS WSEAS, Canc´ un, M´ exico, May 13, 2005. 1
. Plan • Introduction (Background) • Automatic Differentiation – Direct Mode – Reverse Mode The Problem • – Example • Our Approach – Description – Numerical Result • Conclusions • Future Work 2
. Introduction (Background) • Automatic Differentiation (A.D.) : Given program that evaluates function F, builds new program that evaluates derivatives of F. Scientific Applications : Derivatives are useful in optimization, • sensitivity analysis and inverse problems. • Non-differentiability : Introduced in programs by conditional statements (tests). May produced wrong derivatives. • Lack of Validation : A.D. models (neither A.D Tools) include verification of the differentiability of the functions. Novel A.D. model with Validation : We evaluate interval around • input data where no non-differentiability problem arises, this information propagated through conditional statements. 3
. Automatic Differentiation Programs Structure: set of concatenated sequence of instructions I i P = I 1 ; I 2 ; ... ; I p − 1 ; I p but control flow (flowgraph): depending on the inputs the exam- I ple program might be: 3 P = I 1; T 1; I 2; I 4 I T I or 1 1 4 P = I 1; T 1; I 3; I 4 instruction T 1 represents the con- I 2 ditional statement (test). Mathematical Models: composition of elementary functions f i Y = F ( X ) = f p ◦ f p − 1 ◦ ... ◦ f 2 ◦ f 1 Program P evaluates the model F, for every function f i we have a computational representation I i , in right order. 4
. Automatic Differentiation (2) Direct Mode: directional derivatives. Y ′ = F ′ ( X ) · dX = f ′ p ( x p − 1 ) · f ′ p − 1 ( x p − 2 ) · ... · f ′ 1 ( x 0 ) · dX with x i = f i ◦ ... ◦ f 1 , and f ′ i () jacobians. then the new program P’, P ′ = I ′ 1 ; I 1 ; I ′ 2 ; I 2 ; ... ; I ′ p − 1 ; I p − 1 ; I ′ p with I ′ i corresponding to f ′ i () depending on the inputs the diffe- flowgraph again: rentiated example program might be: ’ 3 ; I I 3 P = I ′ 1; I 1; T 1; I ′ 2; I 2; I ′ 4; I 4 or ’ ; ’ ; I I T I I P = I ′ 1; I 1; T 1; I ′ 3; I 3; I ′ 4; I 4 1 1 1 4 4 the differentiated example pro- ’ ; I I gram retains the control flow struc- 2 2 ture of the original program. 5
. Automatic Differentiation (3) Original Code Direct Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 d(x, xd, y, yd, o1, o1d) I 1 x = y ∗ x I ′ I 2 o 1 = x ∗ x + y ∗ y xd = yd ∗ x + y ∗ xd 1 I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I ′ I 3 o 1 = − o 1 ∗ o 1 / 2 o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 else I 2 o 1 = x ∗ x + y ∗ y I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif if ( o 1 > 190 ) then T 1 I ′ end o 1 d = − ( o 1 d ∗ o 1) 3 I 3 o 1 = − ( o 1 ∗ o 1 / 2) else I ′ o 1 d = 40 ∗ o 1 d ∗ o 1 4 I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif end Table 1: Example of Direct Mode of AD. 6
. Automatic Differentiation (3) Reverse Mode: adjoints, gradients. ′ ∗ ( X ) · ¯ ′ ∗ ′ ∗ ′ ∗ ¯ p ( x p − 1 ) · ¯ X = F Y = f 1 ( x 0 ) · f 2 ( x 1 ) · ... · f Y then the new program ¯ P , P = − → P ; ← − P = I 1 ; I 2 ; . . . ; I p − 1 ; I p ; ¯ ¯ I p ; ¯ I p − 1 ; . . . ; ¯ I 2 ; ¯ ¯ I 1 or P ′ t with ¯ I i corresponding to f i () . The reverse sweep ( ← − Remark: P ) eventually needs some values of the ( − → forward sweep P ), but and x 0 others xi might be modified by the forward sweep, thus we have to store them, which for some pro- grams leads to important memory consumption. 7
. Automatic Differentiation (4) Original Code Reverse Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 b(x, xb, y, yb, o1, o1b) I 1 x = y ∗ x PUSH(x) I 2 o 1 = x ∗ x + y ∗ y I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I 2 o 1 = x ∗ x + y ∗ y I 3 o 1 = − o 1 ∗ o 1 / 2 else if ( o 1 > 190) then T 1 ← − I 4 o 1 = o 1 ∗ o 1 ∗ 20 I 3 o 1 b = − ( o 1 ∗ o 1 b ) endif else ← − end I 4 o 1 b = 40 ∗ o 1 ∗ o 1 b endif 8 xb = xb + 2 ∗ x ∗ o 1 b ← − < I 2 yb = yb + 2 ∗ y ∗ o 1 b : POP(x) 8 yb = yb + x ∗ xb ← − < I 1 xb = y ∗ xb : end Table 2: Example of Reverse Mode of AD. 8
. The Problem Motivation: The question of derivatives being valid only in a certain domain is a crucial problem of AD. If derivatives returned by AD are used outside their domain of validity, this can result in errors that are very hard to detect. Description: Programs have control flow structure, including conditional • statements (tests). Some of the test are introduced by intrinsic functions like abs, min, max, etc. • Differentiated program keeps the control flow structure of given program. Sometimes the derivatives depends in the control flow structure. • When some input is too close to a switch of the control flow, the resulting derivative may be very different or wrong, to the point of be useless. 9
. The Problem (2) Evaluation of program P’, xd,yd = 1,1. Evaluation of program P. o1 o1 1.5e+06 1e+06 0 -1e+06 1e+06 -2e+06 -3e+06 -4e+06 -5e+06 -6e+06 500000 -7e+06 -8e+06 -9e+06 o1d 0 0 1 -500000 2 3 4 x 5 -1e+06 6 7 8 7 5 6 4 8 0 3 1 2 y -1.5e+06 0 1 2 3 4 5 6 x Plot of left shows the evaluation of program example with discontinuity problem. Plot of right shows the evaluation of differentiated program example with input space direction (1,1). (x=3.64,o1d=1512117.125) and (x=3.65,o1d=-38513.449) !!! 10
. The Problem (3) Main cases of problems introduced by conditional statements. (from B. Kearfott paper) 11
. Our Approach • every test (t) is analyzed, under small change in the input the test must remain in the same “side” of the inequality. variables used by instructions for example if t i ≥ 0 then ∆ t i + t i ≥ 0 (1) needed to built the current test • the variation of t ( ∆ t i ) have to be expressed in terms of the intermediates variables ( B i ). ∆ t i = J ( T i ) · ∆ B i • and the variation of the intermediates variables is ∆ B i = J ( B i ; . . . ; B 0 ) · ∆ X = J ( B i ) · ... · J ( B 0 ) · ∆ X where ∆ X represents the variation of the inputs values. • re-composing the expression ∆ t i + t i ≥ 0 from (1), (2) < J ( T i ) · J ( B i ) · ... · J ( B 0 ) · ∆ X | e j > ≥ − < t i | e j > 12
. Our Approach (2) • we want isolate ∆ X , a good way to do that is transpose the jacobians in (2) < ∆ X · J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j > ≥ − < t i | e j > (3) • we can use the reverse mode of AD to compute J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j in (3). • unfortunately, in real situations the number of tests is so large that the computation of this approach is not practical. Solutions: • – combine constraints to propagate just one. half-spaces. – reduce the size of the problem. less tests or less inputs, or both. 13
. Our Approach (3) • we analyze one test ( t 0 ), under small change in the input the test must remain in the same “side” of the inequality. if t 0 ≥ 0 then ∆ t 0 + t 0 ≥ 0 (4) • the variation of t ( ∆ t 0 ) have to be expressed in terms of the intermediates variables ( B 0 ). ∆ t 0 = J ( T 0 ) · ∆ B 0 and the variation of the intermediates variables is • ∆ B 0 = J ( B 0 ) · β · ˙ X where β · ˙ X represents the variation of the inputs values. β ˙ the magnitude and X the direction of the variation. • re-composing the expression (4), β · J ( T 0 ) · J ( B 0 ) · ˙ X ≥ − t 0 14
. Our Approach (4) the following expression give us the magnitude of change of the input values, without change the sign of the test. − t 0 (5) β ≥ ˙ J ( T 0) · J ( B 0) · X to compute expression (5) we introduced a function call that propagate the effect of every test trough the program, resulting in a interval of validity, as follows: Direct Differentiated Code Direct Differentiated Code with Validation subroutine sub1 d(x,xd,y,yd,o1,o1d) subroutine sub1 dva(x,xd,y,yd,o1,o1d) I ′ I ′ xd = yd ∗ x + y ∗ xd xd = yd ∗ x + y ∗ xd 1 1 I 1 x = y ∗ x I 1 x = y ∗ x I ′ I ′ o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 2 I 2 o 1 = x ∗ x + y ∗ y I 2 o 1 = x ∗ x + y ∗ y if ( o 1 > 190 ) then CALL VALIDITY TEST(o1 - 190, o1d) T 1 V 1 I ′ if ( o 1 > 190 ) then o 1 d = − ( o 1 d ∗ o 1) T 1 3 I ′ o 1 d = − ( o 1 d ∗ o 1) I 3 o 1 = − ( o 1 ∗ o 1 / 2) 3 else I 3 o 1 = − ( o 1 ∗ o 1 / 2) I ′ else o 1 d = 40 ∗ o 1 d ∗ o 1 4 I ′ I 4 o 1 = o 1 ∗ o 1 ∗ 20 o 1 d = 40 ∗ o 1 d ∗ o 1 4 endif I 4 o 1 = o 1 ∗ o 1 ∗ 20 end endif end 15
Recommend
More recommend