cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv Layers Forward mode vs Reverse mode AD Modern CNN Architectures Zsolt Kira Georgia Tech Administrivia HW1 due date moved! Due:


  1. CS 4803 / 7643: Deep Learning Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures Zsolt Kira Georgia Tech

  2. Administrivia • HW1 due date moved! – Due: 02/18, 11:55pm • Project topic submissions – Submit by 02/21 to get comments – Form filled out with: • Members identified • Paragraph of problem and another paragraph of what has been done in the literature and approach (note: the approach can be selected from an existing paper and reimplemented) • Description of what each member will do • Link – A project idea: ICLR reproducibility challenge (https://reproducibility-challenge.github.io/iclr_2019/) • Official submission Jan but can still do it and submit later! (C) Dhruv Batra and Zsolt Kira 2

  3. • Google cloud credits out! (see piazza for details) • Clouderizer ties in with Google Cloud Platform (GCP) (C) Dhruv Batra and Zsolt Kira 3

  4. Matrix/Vector Derivatives Notation (C) Dhruv Batra and Zsolt Kira 4

  5. (C) Dhruv Batra and Zsolt Kira 5

  6. (C) Dhruv Batra and Zsolt Kiraand Zsolt Kira 6

  7. Plan for Today • Topics: – (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures (C) Dhruv Batra and Zsolt Kira 7

  8. Backprop in Convolutional Layers (C) Dhruv Batra and Zsolt Kira 8

  9. How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 9

  10. Computational Graph x s (scores) * hinge L + loss W R Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  11. Computational Graph Any DAG of differentiable modules is allowed! (C) Dhruv Batra and Zsolt Kira 11 Slide Credit: Marc'Aurelio Ranzato

  12. Directed Acyclic Graphs (DAGs) • Exactly what the name suggests – Directed edges – No (directed) cycles – Underlying undirected cycles okay (C) Dhruv Batra and Zsolt Kira 12

  13. Directed Acyclic Graphs (DAGs) • Concept – Topological Ordering (C) Dhruv Batra and Zsolt Kira 13

  14. Directed Acyclic Graphs (DAGs) (C) Dhruv Batra and Zsolt Kira 14

  15. Numerical vs Analytic Gradients Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient. This is called a gradient check. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  16. How do we compute gradients? • Analytic or “Manual” Differentiation • Symbolic Differentiation • Numerical Differentiation • Automatic Differentiation – Forward mode AD – Reverse mode AD • aka “backprop” (C) Dhruv Batra and Zsolt Kira 16

  17. Forward mode vs Reverse Mode • Key Computations (C) Dhruv Batra and Zsolt Kira 17

  18. Forward mode AD g 18

  19. Reverse mode AD g 19

  20. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 20

  21. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 21

  22. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 22

  23. Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 23

  24. Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 24

  25. Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra and Zsolt Kira 25

  26. Forward Pass vs Forward mode AD vs Reverse Mode AD + sin( ) * x 1 x 2 + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 26

  27. Forward mode vs Reverse Mode • What are the differences? + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra and Zsolt Kira 27

  28. Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 28

  29. Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? • Which one is more memory efficient (less storage)? – Forward or backward? (C) Dhruv Batra and Zsolt Kira 29

  30. Patterns in backward flow Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  31. Patterns in backward flow Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  32. Patterns in backward flow add gate: gradient distributor Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  33. Patterns in backward flow add gate: gradient distributor Q: What is a max gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  34. Patterns in backward flow add gate: gradient distributor max gate: gradient router Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  35. Patterns in backward flow add gate: gradient distributor max gate: gradient router Q: What is a mul gate? Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  36. Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient switcher Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  37. Gradients add at branches + Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  38. Duality in Fprop and Bprop FPROP BPROP SUM + COPY + (C) Dhruv Batra and Zsolt Kira 38

  39. Modularized implementation: forward / backward API Graph (or Net) object (rough psuedo code) 39 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  40. Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 40 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  41. Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 41 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  42. Example: Caffe layers Caffe is licensed under BSD 2-Clause 42 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  43. Caffe Sigmoid Layer * top_diff (chain rule) Caffe is licensed under BSD 2-Clause 43 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  44. (C) Dhruv Batra and Zsolt Kira 44 Figure Credit: Andrea Vedaldi

Recommend


More recommend