how to compute a derivative computing derivatives of
play

How to compute a derivative Computing derivatives of complicated - PowerPoint PPT Presentation

How to compute a derivative Computing derivatives of complicated functions How do you compute the derivatives in an LSTM or GRU cell? How do you compute derivatives of complicated functions in general In these slides we will give you


  1. How to compute a derivative

  2. Computing derivatives of complicated functions • How do you compute the derivatives in an LSTM or GRU cell? • How do you compute derivatives of complicated functions in general • In these slides we will give you some hints • In the slides we will assume vector functions and vector activations • But we will also give you scalar versions of the equations to provide intuition • The two sets will be almost identical, except that when we deal with vector functions • The notation becomes uglier and less intuitive • We must ensure that the dimensions come out right • Please compare vector versions of equations to their scalar counterparts for better intuition, if needed

  3. First: Some notation and conventions • We will refer to the derivative of scalar with respect to as • Regardless of whether the derivative is a scalar, vector, matrix or tensor • The derivative of a scalar w.r.t an column vector is a row vector • The derivative of a scalar w.r.t an matrix is an matrix • Remember our gradient update rule : � � • The derivative of an vector w.r.t an vector is an matrix • The Jacobian

  4. Rules: 1 (scalar) • All terms are scalars • is known

  5. Rules: 1 (vector) • is an vector • is an vector • is an matrix • is a function of • is known (and is a vector) Please verify that the dimensions match!

  6. Rules: 2 (vector, schur multiply) • and are all vectors • “ ” represents component-wise multiplication • is known (and is a vector) Please verify that the dimensions match!

  7. Rules: 3 (scalar) • All terms are scalars • is known

  8. Rules: 3 (vector) • and are all vectors • is known (and is a vector) Please verify that the dimensions match!

  9. Rules: 4 (scalar) • and are scalars • is known

  10. Rules: 4 (vector) • and are vectors • is known (and is a vector) • is the Jacobian of with respect to • May be a diagonal matrix Please verify that the dimensions match!

  11. Rules: 4b (vector) component-wise multiply notation • and are vectors • is known (and is a vector) • is actually a vector of component-wise functions • i.e. � � • is a column vector consisting of the derivatives of the individual components of w.r.t individual components of Please verify that the dimensions match!

  12. Rule 5: Addition of derivatives • Given two variables • And given and • we get • The rule also extends to vector derivatives

  13. Computing derivatives of complex functions • We now are prepared to compute very complex derivatives • Procedure: • Express the computation as a series of computations of intermediate values • Each computation must comprise either a unary or binary relation • Unary relation: RHS has one argument, e.g. • Binary relation: RHS has two arguments e.g. or • Work your way backward through the derivatives of the simple relations

  14. Example: LSTM • Full set of LSTM equations (in the order in which they must be computed) 1 2 3 4 5 6 • Its actually much cleaner to separate the individual components, so lets do that first

  15. LSTM � �� ��� �� ��� �� � � � �� ��� �� ��� �� � � � �� ��� �� � � � � ��� � � � �� ��� �� ��� �� � � � � � • This is the full set of equations in the order in which they must be computed • Lets rewrite these in terms of unary and binary operations

  16. LSTM � �� ��� � �� ��� � � � � �� � � � � � � � � � • Lets rewrite these in terms of unary and binary operations

  17. LSTM

  18. LSTM � �� ��� � �� ��� � � � �� �� � �� � �� �� �� � � �� • Lets rewrite these in terms of unary and binary operations

  19. LSTM 8. 1. 9. 2. 10. 3. 11. 4. 12. 5. 13. 6. 14. 7.

  20. LSTM �� �� ��� �� �� � �� �� �� �� �� � � �� • Lets rewrite these in terms of unary and binary operations

  21. LSTM 15. 16. 17. 18. 19.

  22. LSTM �� � ��� �� � � � �� �� • Lets rewrite these in terms of unary and binary operations

  23. LSTM 15. 16. 17. 18. 19. 20. 21. 22.

  24. LSTM �� �� ��� �� �� ��� �� �� �� �� �� � �� �� �� �� �� � � �� • Lets rewrite these in terms of unary and binary operations

  25. LSTM 15. 23. 16. 24. 17. 25. 18. 26. 27. 19. 20. 28. 29. 21. 22.

  26. LSTM �� � � � �� • Lets rewrite these in terms of unary and binary operations

  27. LSTM 23. 15. 24. 16. 25. 17. 26. 18. 27. 19. 28. 20. 29. 21. 30. 22. 31.

  28. LSTM forward • The full forward computation of the LSTM can be performed by computing Equations 1-31 in sequence • Every one of these equations is unary or binary

  29. LSTM 8. 1. 9. 2. 10. 3. 11. 4. 12. 5. 13. 6. 14. 7.

  30. LSTM 23. 15. 24. 16. 25. 17. 26. 18. 27. 19. 28. 20. 29. 21. 30. 22. 31.

  31. Computing derivatives Derivative shapes: 𝑢 ��� � ��� • We will now work our way backward �� �� • We assume derivatives �� � and �� � of the loss w.r.t ℎ � and 𝐷 � are given �� �� �� • We must compute �� ��� , �� ��� and �� � • And also derivatives w.r.t the parameters within the cell • Recall: the shape of the derivative for any variable will be transposed with respect to that variable

  32. LSTM 23. 1. � � 24. 2. �� � 25. 26. 27. 28. 29. 30. 31.

  33. LSTM 23. 1. � � 24. 2. �� � 25. 3. � �� 26. 27. 28. 29. 30. 31.

  34. LSTM 23. 1. � � 24. 2. �� � 25. 3. � �� 26. 27. 4. �� � 28. 29. 30. 31.

  35. LSTM 23. 1. � � 24. 2. �� � 25. 3. � �� 26. 27. 4. �� � 28. 29. 5. �� �� 30. 6. 31. � �� Equations highlighted in yellow show derivatives w.r.t. parameters

  36. LSTM 23. 7. �� �� 24. 8. �� �� 25. 26. 27. 28. 29. 30. 31.

  37. LSTM 23. 7. �� �� 24. 8. �� �� 25. 9. �� �� 26. 10. 27. � �� 28. 29. 30. 31.

  38. LSTM 23. 7. �� �� 24. 8. �� �� 25. 9. �� �� 26. 10. 27. � �� 11. 28. �� �� 12. 29. �� �� 30. 31.

  39. LSTM 23. 7. �� �� 24. 8. �� �� 25. 9. �� �� 26. 10. 27. � �� 11. 28. �� �� 12. 29. �� �� 30. 13. �� �� 31. 14. ��� ��

  40. LSTM 7. �� �� 23. 8. �� �� 24. 9. �� �� 25. 10. � �� 26. 11. 27. �� �� 12. 28. �� �� 13. 29. �� �� 30. 14. ��� �� 31. 15. �� �� 16. ��� ��

  41. LSTM 15. 7. �� � 16. 8. �� � 17. 18. 19. 20. 21. 22.

  42. LSTM 15. 7. �� � 16. 8. �� � 17. 9. � �� 18. 10. �� � 19. 20. 21. 22.

  43. LSTM 15. 7. �� � 16. 8. �� � 17. 9. � �� 18. 10. �� � 19. 11. ��� �� 20. 12. � �� 21. 22. Second time we’re computing a derivative for C t-1 , so we increment the derivative (“+=“)

  44. LSTM 15. 7. �� � 16. 8. �� � 17. 9. � �� 18. 10. �� � 19. 11. ��� �� 20. 12. � �� 21. 13. 22. �� �

  45. LSTM 15. 14. � �� 16. 15. �� �� 17. 18. 19. 20. 21. 22.

  46. LSTM 15. 14. � �� 16. 15. �� �� 17. 16. � �� 18. 17. �� �� 19. 20. 21. 22.

  47. LSTM 15. 14. � �� 16. 15. �� �� 17. 16. � �� 18. 17. �� �� 19. 18. �� �� 20. 19. � �� 21. 22. Note the “+=“

  48. LSTM 15. 14. � �� 16. 15. �� �� 17. 16. � �� 18. 17. �� �� 19. 18. �� �� 20. 19. � �� 21. 20. �� �� 22. 21. ��� �� Note the “+=“

  49. Continuing the computation • Continue the backward progression until the derivatives from forward Equation 1 have been computed • At this point all derivatives will be computed.

  50. Overall procedure • Express the overall computation as a sequence of unary or binary operations • Can be automated • Computes derivatives incrementally, going backward over the sequence of equations! • Since each atomic computation is simple and belongs to one of a small set of possibilities, the conversion to derivatives is trivial once the computation is serialized as above

Recommend


More recommend