Certification of Derivatives Computed by Automatic Differentiation - PowerPoint PPT Presentation

. Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco¨ et Project TROPICS WSEAS, Canc´ un, M´ exico, May 13, 2005. 1

. Plan • Introduction (Background) • Automatic Differentiation – Direct Mode – Reverse Mode The Problem • – Example • Our Approach – Description – Numerical Result • Conclusions • Future Work 2

. Introduction (Background) • Automatic Differentiation (A.D.) : Given program that evaluates function F, builds new program that evaluates derivatives of F. Scientific Applications : Derivatives are useful in optimization, • sensitivity analysis and inverse problems. • Non-differentiability : Introduced in programs by conditional statements (tests). May produced wrong derivatives. • Lack of Validation : A.D. models (neither A.D Tools) include verification of the differentiability of the functions. Novel A.D. model with Validation : We evaluate interval around • input data where no non-differentiability problem arises, this information propagated through conditional statements. 3

. Automatic Differentiation Programs Structure: set of concatenated sequence of instructions I i P = I 1 ; I 2 ; ... ; I p − 1 ; I p but control flow (flowgraph): depending on the inputs the exam- I ple program might be: 3 P = I 1; T 1; I 2; I 4 I T I or 1 1 4 P = I 1; T 1; I 3; I 4 instruction T 1 represents the con- I 2 ditional statement (test). Mathematical Models: composition of elementary functions f i Y = F ( X ) = f p ◦ f p − 1 ◦ ... ◦ f 2 ◦ f 1 Program P evaluates the model F, for every function f i we have a computational representation I i , in right order. 4

. Automatic Differentiation (2) Direct Mode: directional derivatives. Y ′ = F ′ ( X ) · dX = f ′ p ( x p − 1 ) · f ′ p − 1 ( x p − 2 ) · ... · f ′ 1 ( x 0 ) · dX with x i = f i ◦ ... ◦ f 1 , and f ′ i () jacobians. then the new program P’, P ′ = I ′ 1 ; I 1 ; I ′ 2 ; I 2 ; ... ; I ′ p − 1 ; I p − 1 ; I ′ p with I ′ i corresponding to f ′ i () depending on the inputs the diffe- flowgraph again: rentiated example program might be: ’ 3 ; I I 3 P = I ′ 1; I 1; T 1; I ′ 2; I 2; I ′ 4; I 4 or ’ ; ’ ; I I T I I P = I ′ 1; I 1; T 1; I ′ 3; I 3; I ′ 4; I 4 1 1 1 4 4 the differentiated example pro- ’ ; I I gram retains the control flow struc- 2 2 ture of the original program. 5

. Automatic Differentiation (3) Original Code Direct Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 d(x, xd, y, yd, o1, o1d) I 1 x = y ∗ x I ′ I 2 o 1 = x ∗ x + y ∗ y xd = yd ∗ x + y ∗ xd 1 I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I ′ I 3 o 1 = − o 1 ∗ o 1 / 2 o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 else I 2 o 1 = x ∗ x + y ∗ y I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif if ( o 1 > 190 ) then T 1 I ′ end o 1 d = − ( o 1 d ∗ o 1) 3 I 3 o 1 = − ( o 1 ∗ o 1 / 2) else I ′ o 1 d = 40 ∗ o 1 d ∗ o 1 4 I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif end Table 1: Example of Direct Mode of AD. 6

. Automatic Differentiation (3) Reverse Mode: adjoints, gradients. ′ ∗ ( X ) · ¯ ′ ∗ ′ ∗ ′ ∗ ¯ p ( x p − 1 ) · ¯ X = F Y = f 1 ( x 0 ) · f 2 ( x 1 ) · ... · f Y then the new program ¯ P , P = − → P ; ← − P = I 1 ; I 2 ; . . . ; I p − 1 ; I p ; ¯ ¯ I p ; ¯ I p − 1 ; . . . ; ¯ I 2 ; ¯ ¯ I 1 or P ′ t with ¯ I i corresponding to f i () . The reverse sweep ( ← − Remark: P ) eventually needs some values of the ( − → forward sweep P ), but and x 0 others xi might be modified by the forward sweep, thus we have to store them, which for some programs leads to important memory consumption. 7

. Automatic Differentiation (4) Original Code Reverse Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 b(x, xb, y, yb, o1, o1b) I 1 x = y ∗ x PUSH(x) I 2 o 1 = x ∗ x + y ∗ y I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I 2 o 1 = x ∗ x + y ∗ y I 3 o 1 = − o 1 ∗ o 1 / 2 else if ( o 1 > 190) then T 1 ← − I 4 o 1 = o 1 ∗ o 1 ∗ 20 I 3 o 1 b = − ( o 1 ∗ o 1 b ) endif else ← − end I 4 o 1 b = 40 ∗ o 1 ∗ o 1 b endif 8 xb = xb + 2 ∗ x ∗ o 1 b ← − < I 2 yb = yb + 2 ∗ y ∗ o 1 b : POP(x) 8 yb = yb + x ∗ xb ← − < I 1 xb = y ∗ xb : end Table 2: Example of Reverse Mode of AD. 8

. The Problem Motivation: The question of derivatives being valid only in a certain domain is a crucial problem of AD. If derivatives returned by AD are used outside their domain of validity, this can result in errors that are very hard to detect. Description: Programs have control flow structure, including conditional • statements (tests). Some of the test are introduced by intrinsic functions like abs, min, max, etc. • Differentiated program keeps the control flow structure of given program. Sometimes the derivatives depends in the control flow structure. • When some input is too close to a switch of the control flow, the resulting derivative may be very different or wrong, to the point of be useless. 9

. The Problem (2) Evaluation of program P’, xd,yd = 1,1. Evaluation of program P. o1 o1 1.5e+06 1e+06 0 -1e+06 1e+06 -2e+06 -3e+06 -4e+06 -5e+06 -6e+06 500000 -7e+06 -8e+06 -9e+06 o1d 0 0 1 -500000 2 3 4 x 5 -1e+06 6 7 8 7 5 6 4 8 0 3 1 2 y -1.5e+06 0 1 2 3 4 5 6 x Plot of left shows the evaluation of program example with discontinuity problem. Plot of right shows the evaluation of differentiated program example with input space direction (1,1). (x=3.64,o1d=1512117.125) and (x=3.65,o1d=-38513.449) !!! 10

. The Problem (3) Main cases of problems introduced by conditional statements. (from B. Kearfott paper) 11

. Our Approach • every test (t) is analyzed, under small change in the input the test must remain in the same “side” of the inequality. variables used by instructions for example if t i ≥ 0 then ∆ t i + t i ≥ 0 (1) needed to built the current test • the variation of t ( ∆ t i ) have to be expressed in terms of the intermediates variables ( B i ). ∆ t i = J ( T i ) · ∆ B i • and the variation of the intermediates variables is ∆ B i = J ( B i ; . . . ; B 0 ) · ∆ X = J ( B i ) · ... · J ( B 0 ) · ∆ X where ∆ X represents the variation of the inputs values. • re-composing the expression ∆ t i + t i ≥ 0 from (1), (2) < J ( T i ) · J ( B i ) · ... · J ( B 0 ) · ∆ X | e j > ≥ − < t i | e j > 12

. Our Approach (2) • we want isolate ∆ X , a good way to do that is transpose the jacobians in (2) < ∆ X · J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j > ≥ − < t i | e j > (3) • we can use the reverse mode of AD to compute J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j in (3). • unfortunately, in real situations the number of tests is so large that the computation of this approach is not practical. Solutions: • – combine constraints to propagate just one. half-spaces. – reduce the size of the problem. less tests or less inputs, or both. 13

. Our Approach (3) • we analyze one test ( t 0 ), under small change in the input the test must remain in the same “side” of the inequality. if t 0 ≥ 0 then ∆ t 0 + t 0 ≥ 0 (4) • the variation of t ( ∆ t 0 ) have to be expressed in terms of the intermediates variables ( B 0 ). ∆ t 0 = J ( T 0 ) · ∆ B 0 and the variation of the intermediates variables is • ∆ B 0 = J ( B 0 ) · β · ˙ X where β · ˙ X represents the variation of the inputs values. β ˙ the magnitude and X the direction of the variation. • re-composing the expression (4), β · J ( T 0 ) · J ( B 0 ) · ˙ X ≥ − t 0 14

. Our Approach (4) the following expression give us the magnitude of change of the input values, without change the sign of the test. − t 0 (5) β ≥ ˙ J ( T 0) · J ( B 0) · X to compute expression (5) we introduced a function call that propagate the effect of every test trough the program, resulting in a interval of validity, as follows: Direct Differentiated Code Direct Differentiated Code with Validation subroutine sub1 d(x,xd,y,yd,o1,o1d) subroutine sub1 dva(x,xd,y,yd,o1,o1d) I ′ I ′ xd = yd ∗ x + y ∗ xd xd = yd ∗ x + y ∗ xd 1 1 I 1 x = y ∗ x I 1 x = y ∗ x I ′ I ′ o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 2 I 2 o 1 = x ∗ x + y ∗ y I 2 o 1 = x ∗ x + y ∗ y if ( o 1 > 190 ) then CALL VALIDITY TEST(o1 - 190, o1d) T 1 V 1 I ′ if ( o 1 > 190 ) then o 1 d = − ( o 1 d ∗ o 1) T 1 3 I ′ o 1 d = − ( o 1 d ∗ o 1) I 3 o 1 = − ( o 1 ∗ o 1 / 2) 3 else I 3 o 1 = − ( o 1 ∗ o 1 / 2) I ′ else o 1 d = 40 ∗ o 1 d ∗ o 1 4 I ′ I 4 o 1 = o 1 ∗ o 1 ∗ 20 o 1 d = 40 ∗ o 1 d ∗ o 1 4 endif I 4 o 1 = o 1 ∗ o 1 ∗ 20 end endif end 15

Certification of Derivatives Computed by Automatic Differentiation - PowerPoint PPT Presentation

. Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco et Project TROPICS WSEAS, Canc un, M exico, May 13, 2005. 1 . Plan Introduction (Background) Automatic

Computed Tomography Outline X-RAY Computed Tomography Artifacts and Sources of Error

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

MATHEMATICS 1 CONTENTS Derivatives for functions of two variables Higher-order partial

PARTIAL DERIVATIVES MATH 200 GOALS Figure out how to take derivatives of functions of

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

JSE Limited ALT x Main Equity Agricultural Yield-X Board Derivatives Derivatives Bonds

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

Derivatives Background (uncertainty) Intro: Derivatives Futures Options

Derivatives Differentiability problems in Banach spaces For vector valued functions there are two

Control AIRS clear-sky radiances AIRS cloudy retrievals Anomaly Correlations computed from 90S

3/9/2020 The Virtual The Virtual The Virtual The Virtual Certification Certification

Certification Pathway to CCC-A Todd R. Philbrick, CAE Director, Certification Ex Officio to the

1 2 3 KASS Certification Procedure Operation System Certification Certification Training

ESRI Technical Certification ESRI Technical Certification Stacia Canaday Stacia Canaday June 3,

SOUTH DAKOTA DEPARTMENT OF EDUCATION OFFICE OF EDUCATOR CERTIFICATION Certification@state.sd.us

IIBA Certification Overview 1 Topics A. Benefits of Attaining an IIBA Certification B. IIBA

Controllability and observability are not dual for switched DAEs Stephan Trenn joint work with

Algebraic Diagonals and Walks Alin Bostan Louis Dumont Bruno Salvy INRIA, France July 8, 2015

Regularized & Distributionally Robust Data-Enabled Predictive Control Florian D orfler

EXCHANGE MODEL CONTINUED RELATIVE SUPPLY AND DEMAND Assume identical homothetic indifference

RFNoC: fosphor How to apply RFNoC to RTSA display acceleration Sylvain Munaut FOSDEM 2015,

The Internet Network Layer 15 February, 2001 1 The Internet Network Layer Host, router network

A simple graph consists of , a nonempty set of vertices, and , a set of unordered pairs of

Design of Norm-Optimal Iterative Learning Controllers: The Effect of an Iteration-Domain Kalman

Certification of Derivatives Computed by Automatic Differentiation - PowerPoint PPT Presentation

. Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco et Project TROPICS WSEAS, Canc un, M exico, May 13, 2005. 1 . Plan Introduction (Background) Automatic

Computed Tomography Outline X-RAY Computed Tomography Artifacts and Sources of Error

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

MATHEMATICS 1 CONTENTS Derivatives for functions of two variables Higher-order partial

PARTIAL DERIVATIVES MATH 200 GOALS Figure out how to take derivatives of functions of

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

JSE Limited ALT x Main Equity Agricultural Yield-X Board Derivatives Derivatives Bonds

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

Derivatives Background (uncertainty) Intro: Derivatives Futures Options

Derivatives Differentiability problems in Banach spaces For vector valued functions there are two

Control AIRS clear-sky radiances AIRS cloudy retrievals Anomaly Correlations computed from 90S

3/9/2020 The Virtual The Virtual The Virtual The Virtual Certification Certification

Certification Pathway to CCC-A Todd R. Philbrick, CAE Director, Certification Ex Officio to the

1 2 3 KASS Certification Procedure Operation System Certification Certification Training

ESRI Technical Certification ESRI Technical Certification Stacia Canaday Stacia Canaday June 3,

SOUTH DAKOTA DEPARTMENT OF EDUCATION OFFICE OF EDUCATOR CERTIFICATION Certification@state.sd.us

IIBA Certification Overview 1 Topics A. Benefits of Attaining an IIBA Certification B. IIBA

Controllability and observability are not dual for switched DAEs Stephan Trenn joint work with

Algebraic Diagonals and Walks Alin Bostan Louis Dumont Bruno Salvy INRIA, France July 8, 2015

Regularized &amp; Distributionally Robust Data-Enabled Predictive Control Florian D orfler

EXCHANGE MODEL CONTINUED RELATIVE SUPPLY AND DEMAND Assume identical homothetic indifference

RFNoC: fosphor How to apply RFNoC to RTSA display acceleration Sylvain Munaut FOSDEM 2015,

The Internet Network Layer 15 February, 2001 1 The Internet Network Layer Host, router network

A simple graph consists of , a nonempty set of vertices, and , a set of unordered pairs of

Design of Norm-Optimal Iterative Learning Controllers: The Effect of an Iteration-Domain Kalman

Regularized & Distributionally Robust Data-Enabled Predictive Control Florian D orfler