Implementing autograd Slides by Matthew Johnson
Autograd’s implementation github.com/hips/autograd Dougal Maclaurin, David Duvenaud, Matt Johnson • differentiates native Python code • handles most of Numpy + Scipy • loops, branching, recursion, closures • arrays, tuples, lists, dicts... • derivatives of derivatives • a one-function API!
autodiff implementation options A. direct specification of computation graph B. source code inspection C. monitoring function execution
ingredients: 1. tracing composition of primitive functions 2. vector-Jacobian product for each primitive 3. composing VJPs backward
ingredients: 1. tracing composition of primitive functions 2. vector-Jacobian product for each primitive 3. composing VJPs backward
numpy.sum
primitive autograd.numpy.sum numpy.sum
primitive Node ã autograd.numpy.sum a value: F function: numpy.sum [x] parents:
primitive Node ã autograd.numpy.sum a value: a F function: unbox numpy.sum [x] parents:
primitive ˜ Node ã autograd.numpy.sum Node b a value: value: b a b F function: function: anp.sum unbox numpy.sum box ˜ [x] parents: parents: [ã]
start_node x
start_node x a = A ( x )
start_node x b = B ( a ) a = A ( x )
start_node x b = B ( a ) a = A ( x ) c = C ( b )
start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
start_node end_node No control flow!
ingredients: 1. tracing composition of primitive functions 2. vector-Jacobian product for each primitive 3. composing VJPs backward
a = A ( x ) x
∂ y ∂ a a = A ( x ) x
∂ y ∂ y ∂ x = ? ∂ a a = A ( x ) x
∂ x = ∂ y ∂ y ∂ a · ∂ a ∂ y ∂ x ∂ a a = A ( x ) x
vector-Jacobian product ∂ x = ∂ y ∂ y ∂ y ∂ a · A 0 ( x ) ∂ a a = A ( x ) x
ingredients: 1. tracing composition of primitive functions 2. vector-Jacobian product for each primitive 3. composing VJPs backward
start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y = 1 ∂ y start_node end_node ∂ c y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ y = 1 ∂ b ∂ y start_node end_node ∂ c y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ y = 1 ∂ b ∂ y ∂ y start_node end_node ∂ a ∂ c y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ y ∂ y = 1 ∂ x ∂ b ∂ y ∂ y start_node end_node ∂ a ∂ c y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
higher-order autodiff just works: the backward pass can itself be traced
∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ c ∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ b ∂ c ∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ y ∂ a ∂ b ∂ c ∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y ∂ y ∂ y ∂ y ∂ x ∂ a ∂ b ∂ c ∂ y ∂ y = 1 start_node end_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
∂ y end_node ∂ y = 1 start_node y = D ( c ) x b = B ( a ) a = A ( x ) c = C ( b )
ingredients: 1. tracing composition of primitive functions Node , primitive , forward_pass 2. vector-Jacobian product for each primitive defvjp 3. composing VJPs backward backward_pass , make_vjp , grad
what’s the point? easy to extend! - develop autograd! - forward mode - log joint densities from sampler programs
Recommend
More recommend