differentiable functional programming
play

Differentiable Functional Programming Atlm Gne Baydin University - PowerPoint PPT Presentation

Differentiable Functional Programming Atlm Gne Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016 About me Current (from 11 April 2016): Postdoctoral researcher, Machine


  1. Differentiable Functional Programming Atılım Güneş Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016

  2. About me Current (from 11 April 2016): Postdoctoral researcher, Machine Learning Research Group, University of Oxford http://www.robots.ox.ac.uk/~parg/ Previously: Brain and Computation Lab, National University of Ireland Maynooth : http://www.bcl.hamilton.ie/ Working primarily with F# , on algorithmic differentiation , functional programming , machine learning 1/36

  3. Today’s talk Derivatives in computer programs Differentiable functional programming DiffSharp + Hype libraries Two demos 2/36

  4. Derivatives in computer programs How do we compute them?

  5. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  6. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  7. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  8. Manual differentiation It can get complicated f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 (4th iteration of the logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x ) let f x = 64*x * (1-x) * ((1 - 2*x) ** 2) * ((1 - 8*x + 8*x*x) ** 2) f ′ ( x ) = 128 x ( 1 − x )( − 8 + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 )+ 64 ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 8 x 2 ) 2 let f’ x = 128*x * (1-x) * (-8+16*x) * (1-2*x)**2 * (1-8*x+8*x* x) + 64 * (1-x) * (1-2*x)**2 * (1-8*x+8*x*x)**2 - 64*x(1-2* x)**2 * (1-8*x+8*x*x)**2 - 256*x*(1-x) * (1-2*x) * (1-8*x +8*x*x)**2 4/36

  9. Manual differentiation It can get complicated f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 (4th iteration of the logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x ) let f x = 64*x * (1-x) * ((1 - 2*x) ** 2) * ((1 - 8*x + 8*x*x) ** 2) f ′ ( x ) = 128 x ( 1 − x )( − 8 + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 )+ 64 ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 8 x 2 ) 2 let f’ x = 128*x * (1-x) * (-8+16*x) * (1-2*x)**2 * (1-8*x+8*x* x) + 64 * (1-x) * (1-2*x)**2 * (1-8*x+8*x*x)**2 - 64*x(1-2* x)**2 * (1-8*x+8*x*x)**2 - 256*x*(1-x) * (1-2*x) * (1-8*x +8*x*x)**2 4/36

  10. Symbolic differentiation Computer algebra packages help: Mathematica, Maple, Maxima But, it has some serious drawbacks 5/36

  11. Symbolic differentiation Computer algebra packages help: Mathematica, Maple, Maxima But, it has some serious drawbacks 5/36

  12. Symbolic differentiation We get “ expression swell ” Logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x Number of terms d 600 n l n dx l n d dx l n 1 x 1 500 4 x ( 1 − x ) 4 ( 1 − x ) − 4 x 2 400 16 ( 1 − x )( 1 − 2 x ) 2 − 16 x ( 1 − x )( 1 − 3 2 x ) 2 16 x ( 1 2 x ) 2 − − 300 64 x ( 1 − x )( 1 − 2 x ) 64 x ( 1 − x )( 1 − 128 x ( 1 − x )( − 8 + 200 4 2 x ) 2 ( 1 − 8 x + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 8 x 2 ) + 64 ( 1 − x )( 1 − 100 l n 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 0 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 1 2 3 4 5 8 x 2 ) 2 n 6/36

  13. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  14. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  15. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  16. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  17. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  18. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  19. Numerical differentiation We must select a proper value of h and we face approximation errors Error 10 2 10 0 10 -2 Computed using 10 -4 f ( x ∗ + h ) − f ( x ∗ ) � � Round-off error Truncation error � d � 10 -6 E ( h , x ∗ ) = f ( x ) � � � − dominant dominant � x ∗ � � h dx � � 10 -8 f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 10 -10 x ∗ = 0 . 2 10 -17 10 -15 10 -13 10 -11 10 -9 10 -7 10 -5 10 -3 10 -1 h 9/36

  20. Numerical differentiation Better approximations exist Higher-order finite differences E.g. ∂ f ( x ) = f ( x + h e i ) − f ( x − h e i ) + O ( h 2 ) , ∂ x i 2 h Richardson extrapolation Differential quadrature but they increase rapidly in complexity and never completely eliminate the error 10/36

  21. Numerical differentiation Poor performance: � � f : R n → R , approximate the gradient ∇ f = ∂ x 1 , . . . , ∂ f ∂ f using ∂ x n ∂ f ( x ) ≈ f ( x + h e i ) − f ( x ) , 0 < h ≪ 1 ∂ x i h We must repeat the function evaluation n times for getting ∇ f 11/36

  22. Algorithmic differentiation (AD)

  23. Algorithmic differentiation Also known as automatic differentiation (Griewank & Walther, 2008) Gives numeric code that computes the function AND its derivatives at a given point ❢✭❛✱ ❜✮✿ ❢✬✭❛✱ ❛✬✱ ❜✱ ❜✬✮✿ ✭❝✱ ❝✬✮ ❂ ✭❛✯❜✱ ❛✬✯❜ ✰ ❛✯❜✬✮ ❝ ❂ ❛ ✯ ❜ ❞ ❂ s✐♥ ❝ ✭❞✱ ❞✬✮ ❂ ✭s✐♥ ❝✱ ❝✬ ✯ ❝♦s ❝✮ r❡t✉r♥ ❞ r❡t✉r♥ ✭❞✱ ❞✬✮ Derivatives propagated at the elementary operation level, as a side effect, at the same time when the function itself is computed → Prevents the “expression swell” of symbolic derivatives Full expressive capability of the host language → Including conditionals, looping, branching 12/36

  24. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d 13/36

  25. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d f(2, 3) 13/36

  26. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 f(a, b): c = a * b b = 3 if c > 0 d = log c c = a * b = 6 else d = sin c d = log c = 1.791 return d return d f(2, 3) ( primal ) 13/36

  27. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 a = 2 f(a, b): a’ = 1 c = a * b b = 3 b = 3 if c > 0 b’ = 0 d = log c c = a * b = 6 c = a * b = 6 else c’ = a’ * b + a * b’ = 3 d = sin c d = log c = 1.791 d = log c = 1.791 return d d’ = c’ * (1 / c) = 0.5 return d return d, d’ f(2, 3) ( primal ) ( tangent ) 13/36

Recommend


More recommend