Generalized Derivatives Automatic Evaluation & Implications for - PowerPoint PPT Presentation

Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson Process Systems Engineering Laboratory Massachusetts Institute of Technology

Nonsmooth Equation Solving ◆ Semismooth Newton method: G ( x k )( x − x k ) = − f ( x k ) ◆ Linear programming (LP) Newton method: γ , x γ min s.t. f ( x k ) + G ( x k )( x − x k ) ∞ ≤ γ f ( x k ) 2 ∞ ( x − x k ) ∞ ≤ γ f ( x k ) ∞ x ∈ X Polyhedral set G ( x k ) ◆ some element of a generalized derivative Kojima & Shindo (1986), Qi & Sun (1993), Facchinei, Fischer & Herrich (2014) . 2

Generalized Derivatives f ◆ Suppose locally Lipschitz => differentiable on a set S ◆ B-subdifferential: ∂ B f ( x ): = { H : H = lim i →∞ Jf ( x ( i ) ), x = lim i →∞ x ( i ) , x ( i ) ∈ S } ◆ Clarke Jacobian: ∂ f ( x ): = conv ∂ B f ( x ) f ( x ) = x ∂ f ( x ) = {1} ∂ f ( x ) = { − 1} x ∂ B f ( x ) = { − 1,1}, ∂ f ( x ) = [ − 1,1] ◆ Useful properties of : ∂ f ( x ) Ø Nonempty, convex, and compact Ø Satisfies mean-value theorem, implicit/inverse function theorems Ø Reduces to subdifferential/derivative when is convex/strictly f differentiable Clarke (1973) . 3

Convergence Properties ◆ Suppose generalized derivative contains no singular matrices at the solution ◆ Semismooth Newton method: G ( x k ) ∈∂ f ( x k ) Ø local Q-superlinear convergence if Ø local Q-quadratic convergence if strongly semismooth ◆ Semismooth Newton & LP-Newton methods for PC 1 or strongly semismooth functions: G ( x k ) ∈∂ B f ( x k ) Ø local Q-quadratic convergence if ◆ Automatic/Algorithmic Differentiation (AD) Ø Automatic methods for computing derivatives in complex settings Ø Automatic method for computing elements of generalized derivatives? Ø Computationally relevant generalized derivatives 4

All generalized derivatives are equal… But, some are more equal than others. 5

Obstacles to Automatic Gen. Derivative Evaluation 1 ◆ Automatically evaluating Clarke Jacobian elements is difficult ◆ Lack of sharp calculus rules: g ( x ) = max{0, x } h ( x ) = min{0, x } f ( x ) = g ( x ) + h ( x ) x x x (0 + 0) ∉∂ f (0) = {1} 0 ∈ ∂ h (0) = [0,1] 0 ∈ ∂ g (0) = [0,1] ∂ f (0) ⊂ ∂ g (0) + ∂ h (0) 6

Directional Derivatives & PC 1 Functions ◆ Directional derivative: f ( x + t d ) − f ( x ) f '( x ; d ) = lim t t → 0 + ◆ Sharp chain rule for locally Lipschitz functions: [ f ! g ]'( x ; d ) = f '( g ( x ); g '( x ; d )) ◆ AD gives the directional derivative ◆ PC 1 functions: finite collection of C 1 functions for which { } , ∀ y ∈ N ( x ) f ( y ) ∈ φ ( y ): φ ∈ F f ( x ) ◆ 2-norm not PC 1 Griewank (1994), Scholtes (2012) . 7

Obstacles 2 ◆ PC 1 functions have piecewise linear directional derivative d 2 ′ f (x ; d) = B ( 1 ) d f (x ; d) = B ( 2 ) d ′ d 1 ′ f (x ; d) = B ( 3 ) d 8

Obstacles 2 ◆ PC 1 functions have piecewise linear directional derivative d 2 ′ f (x ; d) = B ( 1 ) d f (x ; d) = B ( 2 ) d ′ d 1 ′ f (x ; d) = B ( 3 ) d ◆ Directional derivatives in the coordinate directions do not necessarily give B- subdifferential elements ◆ Also defeats finite differences 9

Obstacles 3 ∏ m ∂ f(x) ∂ f i (x) ◆ may be a strict subset of i = 1 ⎧ ⎫ ⎡ ⎤ 2 s − 1 1 ⎡ ⎤ x 1 + | x 2 | ∂ f(0) = ⎥ : s ∈ 0 , 1 ⎡ ⎤ ⎨ ⎬ ⎢ ⎣ ⎦ f :( x 1 , x 2 ) ! ⎢ ⎥ 1 − 2 s 1 ⎪ ⎪ ⎣ ⎦ x 1 − | x 2 | ⎩ ⎭ ⎣ ⎦ ⎧ ⎫ 2 s 1 − 1 ⎤ ⎡ ⎪ ⎪ 1 2 ⎡ ⎤ ∂ f 1 (0) × ∂ f 2 (0) = ⎥ :( s 1 , s 2 ) ∈ 0 , 1 ⎨ ⎬ ⎢ ⎣ ⎦ 2 s 2 − 1 1 ⎣ ⎪ ⎪ ⎦ ⎩ ⎭ π 2 ∂ f(0) π 2 ( ∂ f 1 (0) × ∂ f 2 (0)) 10

L-smooth Functions f : X ∈ R n → R m ◆ The following functions are L-smooth: Ø Continuously differentiable functions Ø Convex functions (e.g. abs, 2-norm) Ø PC 1 functions x ! h ( g ( x )) Ø Compositions of L-smooth functions: Ø Integrals of L-smooth functions: b ∫ x ! g ( t , x ) dt a Ø Solutions of ODEs with L-smooth right-hand sides: c ! x ( b , c ), where d x dt ( t , c ) = g ( t , x ( t , c )), x (0, c ) = c Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 11

Lexicographic Derivatives L-subdifferential: ◆ ∂ L f ( x ) = { J L f ( x ; M ):det M ≠ 0} J L f ( x ; M ), det M ≠ 0 Ø Contains L-derivatives in directions M : Useful properties: ◆ Ø L-derivatives classical derivative wherever strictly differentiable Ø L-derivatives elements of Clarke gradient Ø Contains only subgradients when f convex Ø Contained in plenary hull of Clarke Jacobian, and can be used in place of Clarke Jacobian in numerical methods: { Ad : A ∈∂ L f ( x )} ⊂ { Ad : A ∈∂ f ( x )} for each d ∈ R n Ø For PC 1 functions, L-derivatives elements of B-subdifferential Ø Satisfies sharp chain rule, expressed naturally using LD-derivatives Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 12

Lexicographic Directional (LD)-Derivatives ◆ Extension of classical directional derivative M : = [ m (1) ! m ( p ) ] ∈ R n × p , ◆ LD-derivative: for any (0) ( m (1) ) ! f x , M ( p − 1) ( m ( p ) )] f '( x ; M ) = [ f x , M ◆ If M is square and nonsingular: f '( x ; M ) = J L f ( x ; M ) M ◆ If f is differentiable at x : f '( x ; M ) = Jf ( x ) M ◆ Sharp LD-derivative chain rule: [ f ! g ]'( x ; M ) = f '( g ( x ); g '( x ; M )) Khan and Barton (2015). 13

Vector Forward AD Mode for LD-derivatives ◆ Sharp chain rule immediately implies, given the “seed directions” M , forward-mode AD can compute: f '( x ; M ) ◆ Need calculus rules for “elementary functions”: ⋅ 2 Ø abs, min, max, mid, , etc. Ø algorithm for “elemental PC 1 functions” Ø linear programs and lexicographic linear programs parameterized by their RHSs Ø implicit function: h ( w ( z ), z ) = 0 is the unique solution N of w '(ˆ z ; M ) ( ) = 0 h ' (ˆ y ,ˆ z );( N , M ) Khan and Barton (2015), Khan and Barton (2013), Hoeffner et al. (2015). 14

Semismooth Inexact Newton Method i = 1,2, … J ( x ) d i , ◆ Inexact Newton method: ◆ Solve iteratively: J L f ( x ; M ) Δ x = − f ( x ) ◆ But, directional derivative not a linear function of the directions… ⎡ ⎤ M = d 1 , d 2 , … ◆ Let , M nonsingular. Then: ⎣ ⎦ f '( x ; M ) = J L f ( x ; M ) M ◆ But, M not known in advance f '( x ; M ) ◆ Compute columns of one at time Ø computation of a column affects subsequent columns Ø automatic code can be “locked” to record influence of earlier columns ◆ Local Q-superlinear & Q-quadratic convergence rates can be achieved 15

Approximation of LD-derivatives using FDs M : = [ m (1) ! m ( p ) ] ∈ R n × p LD-derivative: ◆ (0) ( m (1) ) ! f x , M ( p − 1) ( m ( p ) )] f '( x ; M ) = [ f x , M FD approx. of using p+1 function evaluations: f '( x ; M ) ◆ (0) ( m (1) ) ≈ α − 1 [ f ( x + α m (1) ) − f ( x )] = : D α m (1) [ f ]( x ) f x , M (1) ( m (2) ) ≈ D α m (2) [ f x , M (0) ]( m (1) ) = D α m (2) D α m (1) [ f ]( x ) f x , M ! ( p − 1) ( m ( p ) ) ≈ D α m ( p ) [ f x , M ( p − 2) ]( m ( p − 1) ) = D α m ( p ) " D α m (2) D α m (1) [ f ]( x ) f x , M x + α m (1) + α 2 m (2) (0) ( m (1) ) f x , M (1) ( m (2) )] f '( x ; M ) = [ f x , M (0) ( m (1) ) ≈ α − 1 [ f ( x + α m (1) ) − f ( x )] f x , M x + α m (1) x (1) ( m (2) ) ≈ α − 2 [ f ( x + α m (1) + α 2 m (2) ) − f ( x + α m (1) )] f x , M 16

Sparse Accumulation for L- derivatives ◆ Cost of AD can be reduced when the Jacobian is sparse Ø Find structurally orthogonal columns n × n n × p I ∈ M ∈ ϒ ϒ Ø Perform vector forward pass with seed matrix rather than ⎡ ⎤ ⎡ ⎤ a b 0 0 1 0 ⎢ ⎥ ⎢ ⎥ c 0 d 0 ⎢ ⎥ 0 1 ⎢ ⎥ ⎢ ⎥ 0 e 0 f ⎢ ⎥ 0 1 ⎢ ⎥ ⎢ ⎥ 0 0 g h 1 0 ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ◆ AD for LD-derivatives à order of the directions matters ⎡ ⎤ Ø Corresponding to M is an uncompressed (permutation) matrix Q : 1 0 0 0 ⎢ ⎥ » M = QD for some matrix D 0 0 1 0 ⎢ ⎥ ⎢ ⎥ 0 0 0 1 Ø Procedure: ⎢ ⎥ 0 1 0 0 ⎣ ⎦ » Identify matrices Q , D , and M ′ f (x ; M) » Perform vector forward pass to calculate ′ ′ f (x ; M) f (x ; Q) » Copy entries of into entries of sparse data structure for f (x ; M) = ′ ′ ◆ Done based on assumption that f (x ; Q)D f (x ; Q)Q − 1 J L f(x ; Q) = ′ » Calculate (i.e. by sparse permutation) f (x ; M) = ′ ′ f (x ; Q)D Ø is not true in general 17

Generalized Derivatives of Algorithms: MHEX model out F 1 , T in F 1 , T 1 1 ! ! out out F | H | , T | H | F | H | , T | H | MHEX in out f 1 , t 1 f 1 , t 1 ! ! out f | C | , t | C | out f | C | , t | C | ( ) ( ) in − T i out − t i ∑ ∑ = out in F i T i f j t i i ∈ H j ∈ C ( ) = 0 p − EBP p min p ∈ P EBP C H Δ Q k ∑ UA − = 0 Δ T LM k k ∈ K k ≠ | K | Watson et al . (2015). 18

Generalized Derivatives Automatic Evaluation & Implications for - PowerPoint PPT Presentation

Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson Process Systems Engineering Laboratory Massachusetts Institute of Technology Nonsmooth Equation Solving

The Classification of Generalized Riemann Derivatives Stefan Catoiu DePaul University, Chicago

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

MATHEMATICS 1 CONTENTS Derivatives for functions of two variables Higher-order partial

PARTIAL DERIVATIVES MATH 200 GOALS Figure out how to take derivatives of functions of

JSE Limited ALT x Main Equity Agricultural Yield-X Board Derivatives Derivatives Bonds

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

Derivatives Background (uncertainty) Intro: Derivatives Futures Options

Derivatives Differentiability problems in Banach spaces For vector valued functions there are two

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

The generalized correlated sampling approach: toward an exact calculation of energy derivatives

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

2.3 Partial Derivatives, Linear Approximation Prof. Tesler Math 20C Fall 2018 Prof. Tesler 2.3

How to compute a derivative Computing derivatives of complicated functions How do you

FNCE 4040 Derivatives Chapter 4 Interest Rates University of Colorado at Boulder Leeds

FNCE4040 Derivatives Chapter 6 Interest Rate Futures University of Colorado at Boulder

Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota

MATH 12002 - CALCULUS I 2.6: Implicit Differentiation Professor Donald L. White Department of

What Can It Look Like in the Science Classroom? Jeremy Peacock, Science Northeast Georgia RESA

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

Differentiating the Flipped Classroom Eric M. Carbaugh, PhD -

Chapter 7: Product Differentiation A1. Firms meet only once in the market. Relax A2. Products are

Differentiating Exponential Functions Lots of real world processes have behavior which can be

Scientific Computing Maastricht Science Program Week 4 Frans Oliehoek