Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 1 / 32
Differentiation Differentiation Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 2 / 32
Differentiation Derivatives have many uses. For instance, ◮ optimization ◮ root-finding ◮ surface normals ◮ curve and surface tessellation Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 3 / 32
Differentiation There are three common differentiation techniques. ◮ Numeric ◮ Symbolic ◮ “Automatic” ( forward & reverse modes) Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 4 / 32
Differentiation What’s a derivative? For scalar domain: d :: Scalar s ⇒ ( s → s ) → ( s → s ) f ( x + ε ) − f x d f x = lim ε ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 5 / 32
Differentiation What’s a derivative? For scalar domain: d :: Scalar s ⇒ ( s → s ) → ( s → s ) f ( x + ε ) − f x d f x = lim ε ε → 0 What about non-scalar domains? Return to this question later. Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 5 / 32
Differentiation Aside : We can treat functions like numbers. instance Num β ⇒ Num ( α → β ) where u + v = λ x → u x + v x u ∗ v = λ x → u x ∗ v x . . . instance Floating β ⇒ Floating ( α → β ) where sin u = λ x → sin ( u x ) cos u = λ x → cos ( u x ) . . . Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 6 / 32
Differentiation We can treat applicatives like numbers. instance Num β ⇒ Num ( α → β ) where (+) = liftA 2 (+) ( ∗ ) = liftA 2 ( ∗ ) . . . instance Floating β ⇒ Floating ( α → β ) where sin = fmap sin cos = fmap cos . . . Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 7 / 32
Differentiation What is automatic differentiation? ◮ Computes function & derivative values in tandem ◮ “Exact” method ◮ Numeric, not symbolic Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 8 / 32
Differentiation Scalar, first-order AD Overload functions to work on function/derivative value pairs: data D α = D α α For instance, D a a ′ + D b b ′ = D ( a + b ) ( a ′ + b ′ ) D a a ′ ∗ D b b ′ = D ( a ∗ b ) ( b ′ ∗ a + a ′ ∗ b ) a ) ( a ′ ∗ cos a ) sin ( D a a ′ ) = D ( sin = D ( sqrt a ) ( a ′ / ( 2 ∗ sqrt a )) sqrt ( D a a ′ ) . . . Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 9 / 32
Differentiation Scalar, first-order AD Overload functions to work on function/derivative value pairs: data D α = D α α For instance, D a a ′ + D b b ′ = D ( a + b ) ( a ′ + b ′ ) D a a ′ ∗ D b b ′ = D ( a ∗ b ) ( b ′ ∗ a + a ′ ∗ b ) a ) ( a ′ ∗ cos a ) sin ( D a a ′ ) = D ( sin = D ( sqrt a ) ( a ′ / ( 2 ∗ sqrt a )) sqrt ( D a a ′ ) . . . Are these definitions correct? Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 9 / 32
Differentiation What is automatic differentiation — really? ◮ What does AD mean? ◮ How does a correct implementation arise? ◮ Where else might these answers take us? Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 10 / 32
What does AD mean? What does AD mean? Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 11 / 32
What does AD mean? What does AD mean? data D α = D α α toD :: ( α → α ) → ( α → D α ) toD f = λ x → D ( f x ) ( d f x ) Spec: toD combinations correspond to function combinations , e.g., toD u + toD v ≡ toD ( u + v ) toD u ∗ toD v ≡ toD ( u ∗ v ) recip ( toD u ) ≡ toD ( recip u ) sin ( toD u ) ≡ toD ( sin u ) cos ( toD u ) ≡ toD ( cos u ) I.e., toD preserves structure . Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 12 / 32
How does a correct implementation arise? How does a correct implementation arise? Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 13 / 32
How does a correct implementation arise? How does a correct implementation arise? Goal: ∀ u . sin ( toD u ) ≡ toD ( sin u ) Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 14 / 32
How does a correct implementation arise? How does a correct implementation arise? Goal: ∀ u . sin ( toD u ) ≡ toD ( sin u ) Simplify each side: sin ( toD u ) ≡ λ x → sin ( toD u x ) ≡ λ x → sin ( D ( u x ) ( d u x )) toD ( sin u ) ≡ λ x → D ( sin u x ) ( d ( sin u ) x ) ≡ λ x → D (( sin ◦ u ) x ) (( d u ∗ cos u ) x ) ≡ λ x → D ( sin ( u x )) ( d u x ∗ cos ( u x )) Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 14 / 32
How does a correct implementation arise? How does a correct implementation arise? Goal: ∀ u . sin ( toD u ) ≡ toD ( sin u ) Simplify each side: sin ( toD u ) ≡ λ x → sin ( toD u x ) ≡ λ x → sin ( D ( u x ) ( d u x )) toD ( sin u ) ≡ λ x → D ( sin u x ) ( d ( sin u ) x ) ≡ λ x → D (( sin ◦ u ) x ) (( d u ∗ cos u ) x ) ≡ λ x → D ( sin ( u x )) ( d u x ∗ cos ( u x )) Sufficient: sin ( D ux dux ) = D ( sin ux ) ( dux ∗ cos ux ) Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 14 / 32
Where else might these answers take us? Where else might these answers take us? Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 15 / 32
Where else might these answers take us? Where else might these answers take us? In this talk ◮ Prettier definitions ◮ Higher-order derivatives ◮ Higher-dimensional functions Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 16 / 32
Where else might these answers take us? Prettier definitions Digging deeper — the scalar chain rule d ( g ◦ u ) x ≡ d g ( u x ) ∗ d u x For scalar domain & range. Variations for other dimensions. Define and reuse: ( g ⊲ ⊳ dg ) ( D ux dux ) = D ( g ux ) ( dg ux ∗ dux ) For instance, = sin ⊲ ⊳ cos sin cos = cos ⊲ ⊳ λ x → − sin x sqrt = sqrt ⊲ ⊳ λ x → recip ( 2 ∗ sqrt x ) Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 17 / 32
Where else might these answers take us? Prettier definitions Function overloadings make for prettier definitions. instance Floating α ⇒ Floating ( D α ) where exp = exp ⊲ ⊳ exp = log ⊲ ⊳ recip log sqrt = sqrt ⊲ ⊳ recip ( 2 ∗ sqrt ) = sin ⊲ ⊳ cos sin cos = cos ⊲ ⊳ − sin acos = acos ⊲ ⊳ recip ( − sqrt ( 1 − sqr )) atan = atan ⊲ ⊳ recip ( 1 + sqr ) sinh = sinh ⊲ ⊳ cosh cosh = cosh ⊲ ⊳ sinh sqr x = x ∗ x Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 18 / 32
Where else might these answers take us? Higher-order derivatives Scalar, higher-order AD Generate infinite towers of derivatives (Karczmarczuk 1998): data D α = D α ( D α ) Suffices to tweak the chain rule: ( g ⊲ ⊳ dg ) ( D ux 0 dux ) = D ( g ux 0 ) ( dg ux 0 ∗ dux ) -- old ( g ⊲ ⊳ dg ) ux @( D ux 0 dux ) = D ( g ux 0 ) ( dg ux ∗ dux ) -- new Most other definitions can then go through unchanged. The derivations adapt. Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 19 / 32
Where else might these answers take us? Higher-dimensional functions What’s a derivative – really? For scalar domain: f ( x + ε ) − f x d f x = lim ε ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 20 / 32
Where else might these answers take us? Higher-dimensional functions What’s a derivative – really? For scalar domain: f ( x + ε ) − f x d f x = lim ε ε → 0 Redefine: unique scalar s such that f ( x + ε ) − f x lim − s ≡ 0 ε ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 20 / 32
Where else might these answers take us? Higher-dimensional functions What’s a derivative – really? For scalar domain: f ( x + ε ) − f x d f x = lim ε ε → 0 Redefine: unique scalar s such that f ( x + ε ) − f x lim − s ≡ 0 ε ε → 0 Equivalently, f ( x + ε ) − f x − s · ε lim ≡ 0 ε ε → 0 or f ( x + ε ) − ( f x + s · ε ) lim ≡ 0 ε ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 20 / 32
Where else might these answers take us? Higher-dimensional functions What’s a derivative – really? f ( x + ε ) − ( f x + s · ε ) lim ≡ 0 ε ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 21 / 32
Where else might these answers take us? Higher-dimensional functions What’s a derivative – really? f ( x + ε ) − ( f x + s · ε ) lim ≡ 0 ε ε → 0 Now generalize: unique linear map T such that: | f ( x + ε ) − ( f x + T ε ) | lim ≡ 0 | ε | ε → 0 Conal Elliott (LambdaPix) Beautiful differentiation 1 September, 2009 ICFP 21 / 32
Recommend
More recommend