Some Notes on Automatic Differentiation Chih-Jen Lin National Taiwan University Last updated: May 25, 2020 Chih-Jen Lin (National Taiwan Univ.) 1 / 13
Here we give some notes on the slides at https://www.cs.toronto.edu/~rgrosse/courses/ csc321_2018/slides/lec10.pdf Chih-Jen Lin (National Taiwan Univ.) 2 / 13
P6 I The expression on the right means ∂ L ∂ L = 1 ∂ L ∂ y = y − t ∂ L ∂ z = ∂ L ∂ y σ ′ ( z ) ∂ L ∂ w = ∂ L ∂ z · x ∂ L ∂ b = ∂ L ∂ z · 1 Chih-Jen Lin (National Taiwan Univ.) 3 / 13
P6 II “transform the left-hand side into the right-hand side”: we want to calculate ∂ L ∂ w and it can be replaced by ∂ L ∂ z · x What we have discussed is the so called reverse mode of automatic differentiation Chih-Jen Lin (National Taiwan Univ.) 4 / 13
P6 III We notice that in every expression we deal with ∂ L ∂ (something) Note that the backward setting to calculate the gradient of neural networks is a special case of the reverse mode automatic differentiation There is also forward mode of automatic differentiation, in which every node is ∂ (something) ∂ (variables) Chih-Jen Lin (National Taiwan Univ.) 5 / 13
P6 IV We will probably see an example of forward mode later in discussing Newton methods Chih-Jen Lin (National Taiwan Univ.) 6 / 13
P13 I Things shown on this slide are more general Even for scalar we do the same thing We will see this on p15 Chih-Jen Lin (National Taiwan Univ.) 7 / 13
P15 I We want to calculate ∂ x = ∂ ∂ ∂ y ∂ y ∂ x So we need ∂ y and ∂ y ∂ ∂ x Further ∂ y ∂ x may involve y or x Chih-Jen Lin (National Taiwan Univ.) 8 / 13
P15 II That’s why it says to get ∂ ∂ x we need ∂ ∂ y , x , and y Chih-Jen Lin (National Taiwan Univ.) 9 / 13
P15 III Example: y = exp( x ) ∂ x = ∂ ∂ ∂ y ∂ y ∂ x = ∂ ∂ y exp( x ) = ∂ ∂ y y Chih-Jen Lin (National Taiwan Univ.) 10 / 13
P15 IV For this case we need ∂ ∂ y and y Chih-Jen Lin (National Taiwan Univ.) 11 / 13
P17 I The lines for argnum, parent in zip(argnums, node.parents): vjp = primitive_vjps[fun][argnum] parent_grad = vjp(outgrad, value, *args, outgrads[parent] = add_outgrads(outgrads. roughly correspond to ∂ ∂ ∂ y i � = ∂ x j ∂ y i ∂ x j i on p13 Chih-Jen Lin (National Taiwan Univ.) 12 / 13
P20 I We do not discuss pages 20-23 Chih-Jen Lin (National Taiwan Univ.) 13 / 13
Recommend
More recommend