Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007
Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4 Conclusion and Outlook 5 Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 1 / 17
Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Topics of this talk: 1 : Generation of sparsity pattern Step 3 : Evaluating Hessian-matrix products Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17
Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Algorithm I: Function Evaluation for i = 1 , . . . , n v i − n = x i for i = 1 , . . . , l v i = ϕ i ( v j ) j ≺ i y = v l with precedence relation j ≺ i : ϕ i ( v j ) j ≺ i = ϕ i ( v j ) or ϕ i ( v j ) j ≺ i = ϕ i ( v j , v l ) with j , l < i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17
Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17
Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j For sparse Hessians additionally nonlinear interaction domains ∂ 2 y � � j ≤ n : ⊆ N i � = 0 ∂ x i ∂ x j for i = 1 , . . . , n . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17
Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ then ∀ k ∈ X ˆ : N k ← N k ∪ X j else ∀ k ∈ X ˆ : N k ← N k ∪ X i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ then ∀ k ∈ X ˆ : N k ← N k ∪ X j else ∀ k ∈ X ˆ : N k ← N k ∪ X i Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j (1) if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i (2) if v i = ϕ i ( v j , v ˆ ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ (3) else ∀ k ∈ X j : N k ← N k ∪ X i (4) if v i linear in v ˆ then ∀ k ∈ X ˆ : N k ← N k ∪ X j (5) else ∀ k ∈ X ˆ : N k ← N k ∪ X i (6) Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17
Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Proof: Analyse set operations: j ≺ i X j ) ≤ 3 ¯ (1): OPS( X i = � n i (2): OPS( ∀ k ∈ X i : N k = N k ∪ X i ) ≤ ¯ n i ( 2 ˆ n + ˆ n ) = 3 ˆ n ¯ n i (3) - (6): same as (2) Details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17
Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17
Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Complexity [Griewank 2000]: TIME ( H ( x ) ˙ x ) ≤ ω soad TIME ( f ( x )) ω soad ∈ [ 7 , 10 ] with TIME ( H ( x ) S ) ≤ ω soad p TIME ( f ( x )) with ω soad ∈ [ 7 , 10 ] . Reduction possible ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17
Evaluating Hessian-matrix Products Computation of H ( x ) ˙ x ∈ R n for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , v i − n x i , v i − n = 0 , v i − n = 0 for i = 1 , . . . , l ∂ � = ϕ i ( v j ) j ≺ i , ˙ ϕ i ( v j ) j ≺ i ˙ v i v i = v j , ∂ v j j ≺ i y = ˙ ˙ ¯ v l = ¯ y = v l , v l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ � ¯ v j += ¯ ϕ i ( v j ) j ≺ i ˙ v i v k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , x i = v i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products Computation of H ( x ) ˙ x ∈ R n for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , v i − n x i , v i − n = 0 , v i − n = 0 for i = 1 , . . . , l ∂ � = ϕ i ( v j ) j ≺ i , ˙ ϕ i ( v j ) j ≺ i ˙ v i v i = v j , ∂ v j j ≺ i y = ˙ ˙ ¯ v l = ¯ y = v l , v l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ � ¯ v j += ¯ ϕ i ( v j ) j ≺ i ˙ v i v k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , x i = v i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products Computation of H ( x ) ˙ X ∈ R n × p for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , V i − n X i , v i − n = 0 , V i − n = 0 for i = 1 , . . . , l ∂ ˙ ϕ i ( v j ) j ≺ i ˙ � v i = ϕ i ( v j ) j ≺ i , V i = V j , ∂ v j j ≺ i Y = ˙ ˙ ¯ v l = ¯ y = v l , V l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ ¯ ϕ i ( v j ) j ≺ i ˙ � V j += ¯ v i V k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , X i = V i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17
Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Reduction of run time complexity: From [ 7 p , 10 p ] to [ 4 + 3 p , 4 + 6 p ] due to reduced number of recalculations. For details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17
Numerical Results Implementation New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Recommend
More recommend