AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN - PowerPoint PPT Presentation

2016 SIAM WORKSHOP ON COMBINATORIAL SCIENTIFIC COMPUTING 10 OCTOBER 2016 AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN REPRESENTATION PROBLEM P a u l h vla o n d PAUL HOVLAND Mathematics and Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA rh e tjh tyh y

OUTLINE § Introduction to Automatic/Algorithmic Differentiation (AD) § A graph model of AD § Preaccumulation and scarcity § Experimental results § Conclusions 2

AUTOMATIC/ALGORITHMIC DIFFERENTIATION AD in a Nutshell § Technique for computing analytic derivatives of functions computed by programs (potentially millions of lines of code) § Derivatives used in optimization, nonlinear PDEs, sensitivity analysis, inverse problems, uncertainty quantification, etc. § AD = analytic differentiation of elementary functions + propagation by chain rule – Every programming language provides a limited number of elementary mathematical functions – Thus, every function computed by a program may be viewed as the composition of these so-called intrinsic functions – Derivatives for the intrinsic functions are known and can be combined using the chain rule of differential calculus 3

AUTOMATIC/ALGORITHMIC DIFFERENTIATION Forward Mode AD § Start with independent variables and follow flow of the original function computation § Computes Jacobian times a matrix S § Cost is proportional to the number of columns in S § Special case: Jv costs a small constant times the cost of the function § Ideal for functions with a small number of independent variables § Partial derivatives associated with intermediate variables are used at the same time as the variables themselves 5

AUTOMATIC/ALGORITHMIC DIFFERENTIATION Reverse/Adjoint Mode AD § Start with dependent variables and propagate derivatives back to independent variables § Computes matrix W times Jacobian § Cost is proportional to the number of rows in W § Special case: J T v costs a small constant times the cost of the function § Ideal for functions with a small number of dependent variables § Intermediate partial derivatives must be stored or recomputed – in the worst case storage grows with the number of operations in the function § Control flow must be reversed (must store or reproduce control flow decisions) 6

AUTOMATIC/ALGORITHMIC DIFFERENTIATION AD Tool Implementation Domain-specific data-flow analyses Combinatorial algorithms Parse/unparse 8

AUTOMATIC/ALGORITHMIC DIFFERENTIATION Combinatorial problems in AD § Minimize ops to compute Jacobian f – Exploit chain rule associativity – Related to min fill in factorization a § Minimal representation c – Minimize edge count in DAG a b – Jacobian as the sum/product of y 1 2 3 4 5 6 sparse/low-rank matrices 4 5 § Adjoint recompute/store tradeoff t0 d0 a 4 – G&W: Minimize recomputation – Aupy et al.: minimize time y 1 2 3 x § Matrix (graph) coloring 2 – Minimize columns in JS 1 9

A GRAPH MODEL OF AD Accumulating derivatives § Represent function using a directed acyclic graph (DAG) § Computational graph – Vertices are intermediate variables, annotated with function/operator – Edges are unweighted § Linearized computational graph – Edge weights are partial derivatives – Vertex labels are not needed § Compute sum of weights over all paths from independent to dependent variable(s), where the path weight is the product of the weights of all edges along the path [Baur & Strassen] § Find an order in which to compute path weights that minimizes cost (flops): identify common subpaths (=common subexpressions in Jacobian) 10

A GRAPH MODEL OF AD A simple example f * b = sin(y)*y a = exp(x) c = a*b f = a*c c * * b sin exp a y x 11

A GRAPH MODEL OF AD A simple example f f * t0 = sin(y) d0 = cos(y) a b = t0*y a = exp(x) c c * c = a*b f = a*c a b * b y sin exp a t0 d0 a y y x x 12

A GRAPH MODEL OF AD Brute force § Compute products of edge weights along all paths f § Sum all paths from same source to same v 5 target § Hope the compiler does a good job a recognizing common subexpressions c v 4 a b v 2 y v 1 v 3 dfdy = d0*y*a*a + t0*a*a t0 dfdx = a*b*a + a*c d0 a y x V -1 v 0 8 mults 2 adds 13

A GRAPH MODEL OF AD Vertex elimination f § Multiply each in edge by each out edge, add the product to the edge from the predecessor to the successor a § Conserves path weights c § This procedure always terminates a § The terminal form is a bipartite graph b 14

A GRAPH MODEL OF AD Vertex elimination f § Multiply each in edge by each out edge, add the product to the edge from the predecessor to the successor a*a § Conserves path weights c + a*b § This procedure always terminates § The terminal form is a bipartite graph 15

A GRAPH MODEL OF AD Forward mode: eliminate vertices in topological order t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) a c = a*b c v 4 f = a*c a b v 2 y v 1 v 3 t0 d0 a y x 16

A GRAPH MODEL OF AD Forward mode: eliminate vertices in topological order t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) a c = a*b c v 4 f = a*c d1 = t0 + d0*y a b v 2 v 3 d1 a y x 17

A GRAPH MODEL OF AD Forward mode: eliminate vertices in topological order t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) a c = a*b c v 4 f = a*c d1 = t0 + d0*y d2 = d1*a b v 3 d2 a y x 18

A GRAPH MODEL OF AD Forward mode: eliminate vertices in topological order t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) a c = a*b v 4 f = a*c d1 = t0 + d0*y d4 d2 = d1*a d3 = a*b d4 = a*c d2 d3 y x 19

A GRAPH MODEL OF AD Forward mode: eliminate vertices in topological order t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = d1*a dfdy dfdx d3 = a*b d4 = a*c dfdy = d2*a dfdx = d4 + d3*a y x 6 mults 2 adds 20

A GRAPH MODEL OF AD Reverse mode: eliminate vertices in reverse topological order f t0 = sin(y) d0 = cos(y) b = t0*y a a = exp(x) c v 4 c = a*b f = a*c a b v 2 y v 1 v 3 t0 d0 a y x 21

A GRAPH MODEL OF AD Reverse mode: eliminate vertices in reverse topological order f t0 = sin(y) d0 = cos(y) b = t0*y d1 a = exp(x) d2 c = a*b f = a*c d1 = a*a v 2 d2 = c + b*a y v 1 v 3 t0 d0 a y x 22

A GRAPH MODEL OF AD Reverse mode: eliminate vertices in reverse topological order f t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) d4 d2 c = a*b f = a*c d1 = a*a d3 d2 = c + b*a v 1 d3 = t0*d1 v 3 d4 = y*d1 d0 a y x 23

A GRAPH MODEL OF AD Reverse mode: eliminate vertices in reverse topological order f t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) d2 c = a*b f = a*c d1 = a*a dfdy d2 = c + b*a d3 = t0*d1 v 3 d4 = y*d1 dfdy = d3 + d0*d4 a y x 24

A GRAPH MODEL OF AD Reverse mode: eliminate vertices in reverse topological order f t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = a*a dfdy dfdx d2 = c + b*a d3 = t0*d1 d4 = y*d1 dfdy = d3 + d0*d4 dfdx = a*d2 y x 6 mults 2 adds 25

A GRAPH MODEL OF AD “Cross country” mode t0 = sin(y) f d0 = cos(y) b = t0*y a a = exp(x) c = a*b c v 4 f = a*c a b v 2 y v 1 v 3 t0 d0 a y x 26

A GRAPH MODEL OF AD “Cross country” mode t0 = sin(y) f d0 = cos(y) b = t0*y a a = exp(x) c = a*b c v 4 f = a*c a d1 = t0 + d0*y b v 2 v 3 d1 a y x 27

A GRAPH MODEL OF AD “Cross country” mode t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) d2 d3 c = a*b f = a*c d1 = t0 + d0*y d2 = a*a v 2 d3 = c + b*a v 3 d1 a y x 28

A GRAPH MODEL OF AD “Cross country” mode t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) c = a*b d3 f = a*c d1 = t0 + d0*y d2 = a*a dfdy d3 = c + b*a dfdy = d1*d2 v 3 a y x 29

A GRAPH MODEL OF AD “Cross country” mode t0 = sin(y) f d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = a*a dfdy dfdx d3 = c + b*a dfdy = d1*d2 dfdx = a*d3 y x 5 mults 2 adds 30

PREACCUMULATION AND SCARCITY Statement-Level Preaccumulation in ADIFOR § My first project at Argonne (1991) § Use forward mode as overall strategy, but differentiate each statement using the reverse mode § Arose from recognition by Bischof and Griewank that implementing pure forward mode would require allocation of temporary arrays § Reduces memory requirements and, frequently, number of operations 31

PREACCUMULATION AND SCARCITY Probably small; Basic-block level preaccumulation maybe sparse § For each basic block, first compute derivatives of out variables to in variables for that basic block, then apply the chain rule to obtain derivatives of out variables to independent variables (or dependent variables to in variables) Likely large; probably dense !" ()* !" ()* !" !" !" &' !" #$% #$% #$% !" &'()* = !" &' = !" &' !" &'()* !" !" &' #$% § In context of overall reverse mode strategy, offers potential to reduce the memory requirements for each basic block (store partial derivatives instead of intermediate variables) § Storage and reverse mode accumulation cost is proportional to number of nonzeros in preaccumulated Jacobian 32

AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN - PowerPoint PPT Presentation

2016 SIAM WORKSHOP ON COMBINATORIAL SCIENTIFIC COMPUTING 10 OCTOBER 2016 AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN REPRESENTATION PROBLEM P a u l h vla o n d PAUL HOVLAND Mathematics and Computer Science Division

Integer Programming Part 1 Prof. Dr. Arslan M. RNEK Integer Programming An integer

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING Lecture 25 Vehicle

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

Integer programming Math 482, Lecture 32 Misha Lavrov April 24, 2020 Introduction to integer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer

Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

RETHINKING THINKING MODELS FOR EVENT-DRIVEN PROGRAMMING @cdavisafc function SumToN(n : INTEGER):

Constraint Integer Programming Leon Eifler, eifler@zib.de CO@Work, 2020 Outline Constraint

Mixed Integer Programming: Algorithms and Applications Julia Borghoff Mykonos May 2012 1 / 46

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer

Integer Programming and Totally unimodular matrices Carlo Mannino (from Geir Dahl and Carlo

iLab NAT / DHCP Florian Wohlfart Minoo Rouhi lastname @in.tum.de Chair of Network Architectures

Unification It is an algorithm for determining the substitutions needed to make two predicate

Inheritance Inheritance Introduction Three different kinds of inheritance Chapter 15

Families as Shocks Luis Cubeddu Jos e-V ctor R os-Rull IMF Penn, CAERP, CEPR

APCNN : Tackling Class Imbalance in Relation Extraction through Aggregated Piecewise Convolutional

OSVGAN: Generative Adversarial Networks for Data Scarce Online Signature Verification Chandra

SEAN CHAMBERS DIRECTOR OF WATER AND SEWER CITY OF GREELEY THOUGHTS ON WATER BANKING THROUGH

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN - PowerPoint PPT Presentation

2016 SIAM WORKSHOP ON COMBINATORIAL SCIENTIFIC COMPUTING 10 OCTOBER 2016 AN INTEGER PROGRAMMING FORMULATION OF THE MINIMAL JACOBIAN REPRESENTATION PROBLEM P a u l h vla o n d PAUL HOVLAND Mathematics and Computer Science Division

Integer Programming Part 1 Prof. Dr. Arslan M. RNEK Integer Programming An integer

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING Lecture 25 Vehicle

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

Integer programming Math 482, Lecture 32 Misha Lavrov April 24, 2020 Introduction to integer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics &amp; Computer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics &amp; Computer

Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

RETHINKING THINKING MODELS FOR EVENT-DRIVEN PROGRAMMING @cdavisafc function SumToN(n : INTEGER):

Constraint Integer Programming Leon Eifler, eifler@zib.de CO@Work, 2020 Outline Constraint

Mixed Integer Programming: Algorithms and Applications Julia Borghoff Mykonos May 2012 1 / 46

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics &amp; Computer

Integer Programming and Totally unimodular matrices Carlo Mannino (from Geir Dahl and Carlo

iLab NAT / DHCP Florian Wohlfart Minoo Rouhi lastname @in.tum.de Chair of Network Architectures

Unification It is an algorithm for determining the substitutions needed to make two predicate

Inheritance Inheritance Introduction Three different kinds of inheritance Chapter 15

Families as Shocks Luis Cubeddu Jos e-V ctor R os-Rull IMF Penn, CAERP, CEPR

APCNN : Tackling Class Imbalance in Relation Extraction through Aggregated Piecewise Convolutional

OSVGAN: Generative Adversarial Networks for Data Scarce Online Signature Verification Chandra

SEAN CHAMBERS DIRECTOR OF WATER AND SEWER CITY OF GREELEY THOUGHTS ON WATER BANKING THROUGH

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer

Integer Linear Programming Modeling Marco Chiarandini Department of Mathematics & Computer