Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 1 / 21
Outline ◮ Second order reverse mode of Automatic Differentiation ◮ Vertex elimination for evaluating the Gradient and the Hessian ◮ The correspondence between second order reverse mode and vertex elimination ◮ Discussion and board picture Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 2 / 21
AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21
AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21
AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ y = pow(pow(x*x, 2.0), x), ( x > 0 , y = x 4 x ) ◮ v 0 << = x ◮ v 1 = ϕ 1 ( v 0 ) = v 0 ∗ v 0 ◮ v 2 = ϕ 2 ( v 1 ) = pow ( v 1 , 2 . 0) ◮ v 3 = ϕ 3 ( v 2 , v 0 ) = pow ( v 2 , v 0 ) ◮ v 3 >> = y Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21
AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Indexing convention : ◮ Independent variables : v 1 − n , · · · , v 0 ◮ Intermediate variables : v 1 , · · · , v l − 1 ◮ Dependent variable : v l Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21
Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21
Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21
Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21
Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21
Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21
Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21
Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21
Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) f k +1 ( S k +1 ) � �� � f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 � �� � f k ( S k ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21
Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) ◮ First order chain rule : ∂ f k ∂ v i = ∂ f k +1 + ∂ v k ∂ f k +1 ∂ v i ∂ v i ∂ v k Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21
Recommend
More recommend