full stack deep learning
play

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh - PowerPoint PPT Presentation

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh Tobin, Sergey Karayev, Pieter Abbeel Lifecycle of a ML project Cross-project Per-project infrastructure activities Planning & Team & hiring project setup Data


  1. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 35

  2. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 36

  3. 1. Start simple Demystifying architecture selection Start here Consider using this later Images LeNet-like architecture ResNet Images LSTM with one hidden Attention model or Sequences layer (or temporal convs) WaveNet-like model Fully connected neural net Problem-dependent Other with one hidden layer Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 37

  4. 1. Start simple Dealing with multiple input modalities Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 38

  5. 1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 39

  6. 1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space ConvNet Flatten (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 40

  7. 1. Start simple Dealing with multiple input modalities 2. Concatenate ConvNet Flatten Con“cat” (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 41

  8. 1. Start simple Dealing with multiple input modalities 3. Pass through fully connected layers to output ConvNet Flatten FC FC Output Concat (64-dim) Input 1 (72-dim) Input 2 T/F LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) (128-dim) “cat” (256-dim) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 42

  9. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 43

  10. 1. Start simple Recommended network / optimizer defaults • Optimizer: Adam optimizer with learning rate 3e-4 • Activations: relu (FC and Conv models), tanh (LSTMs) • Initialization: He et al. normal (relu), Glorot normal (tanh) • Regularization: None • Data normalization : None Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 44

  11. <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> 
 
 
 <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> 1. Start simple Definitions of recommended initializers • (n is the number of inputs, m is the number of outputs) • He et al. normal (used for ReLU) 
 ! r 2 N 0 , n • Glorot normal (used for tanh) ! r 2 N 0 , n + m Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 45

  12. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 46

  13. 1. Start simple Important to normalize scale of input data • Subtract mean and divide by variance • For images, fine to scale values to [0, 1] or [-0.5, 0.5] 
 (e.g., by dividing by 255) 
 [Careful, make sure your library doesn’t do it for you!] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 47

  14. 1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 48

  15. 1. Start simple Consider simplifying the problem as well • Start with a small training set (~10,000 examples) • Use a fixed number of objects, classes, image size, etc. • Create a simpler synthetic training set Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 49

  16. 1. Start simple Simplest model for pedestrian detection Running example • Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test • Use a LeNet architecture with sigmoid cross-entropy loss • Adam optimizer with LR 3e-4 • No regularization 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 50

  17. 1. Start simple Starting simple Steps Summary • LeNet, LSTM, or Fully Choose a simple Connected a architecture • Adam optimizer & no regularization b Use sensible defaults • Subtract mean and divide by std, or just divide by c Normalize inputs 255 (ims) • Start with a simpler version of your problem d Simplify the problem (e.g., smaller dataset) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 51

  18. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 52

  19. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 53

  20. 2. Implement & debug Preview: the five most common DL bugs • Incorrect shapes for your tensors 
 Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None) • Pre-processing inputs incorrectly 
 E.g., Forgetting to normalize, or too much pre-processing • Incorrect input to your loss function 
 E.g., softmaxed outputs to a loss that expects logits • Forgot to set up train mode for the net correctly 
 E.g., toggling train/eval, controlling batch norm dependencies • Numerical instability - inf/NaN 
 Often stems from using an exp, log, or div operation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 54

  21. 2. Implement & debug General advice for implementing your model Lightweight implementation • Minimum possible new lines of code for v1 • Rule of thumb: <200 lines • (Tested infrastructure components are fine) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 55

  22. 2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…) 
 • Rule of thumb: <200 lines instead of 
 tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 56

  23. 2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…) 
 • Rule of thumb: <200 lines instead of 
 tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Build complicated data pipelines later • Start with a dataset you can load into memory Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 57

  24. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 58

  25. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 59

  26. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 60

  27. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 1: step through graph creation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 61

  28. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 2: step into training loop Evaluate tensors using sess.run(…) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 62

  29. 
 2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier 
 Option 3: use tfdb python -m tensorflow.python.debug.examples.debug_mnist --debug Stops execution at each sess.run(…) and lets you inspect Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 63

  30. 2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 64

  31. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Shape • Confusing tensor.shape, tf.shape(tensor), mismatch Undefined tensor.get_shape() shapes • Reshaping things to a shape of type Tensor (e.g., when loading data from a file) • Flipped dimensions when using tf.reshape(…) Incorrect • Took sum, average, or softmax over wrong shapes dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if shape is (None, 1, 1, 4) • Data stored on disk in a di ff erent dtype than loaded (e.g., stored a float64 numpy array, and loaded it as a float32) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 65

  32. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Casting issue • Forgot to cast images from uint8 to float32 Data not in • Generated data using numpy in float64, forgot to float32 cast to float32 Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 66

  33. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues • Too large a batch size for your model (e.g., OOM Too big a during evaluation) tensor • Too large fully connected layers • Loading too large a dataset into memory, rather Too much than using an input queue data • Allocating too large a bu ff er for dataset creation • Memory leak due to creating multiple models in Duplicating the same session operations • Repeatedly creating an operation (e.g., in a function that gets called over and over again) • Other processes running on your GPU Other processes Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 67

  34. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Other common • Forgot to initialize variables errors Other bugs • Forgot to turn o ff bias when using batch norm • “Fetch argument has invalid type” - usually you overwrote one of your ops with an output during training Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 68

  35. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 69

  36. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 70

  37. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 71

  38. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 72

  39. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes • Data or labels corrupted (e.g., zeroed, incorrectly shu ffl ed, or preprocessed incorrectly) Error oscillates • Learning rate too high Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 73

  40. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits, accidentally add ReLU on output) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 74

  41. 2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high • Data or labels corrupted (e.g., zeroed or incorrectly shu ffl ed) Error oscillates • Learning rate too high • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 75

  42. 2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 76

  43. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful You can: 
 • Walk through code line-by-line and ensure you have the same output • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 77

  44. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can: 
 • Walk through code line-by-line and ensure you have the same output Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 78

  45. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can: 
 • Results from your model on a benchmark dataset (e.g., • Same as before, but with lower MNIST) confidence • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 79

  46. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from a paper (with no code) You can: 
 • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 80

  47. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can: 
 • Uno ffi cial model implementation • Make sure your model performs well in a simpler setting • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 81

  48. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation You can: 
 • Results from the paper (with no code) • Get a general sense of what kind of performance can be expected • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 82

  49. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can: 
 • Results from your model on a benchmark dataset (e.g., • Make sure your model is learning MNIST) anything at all • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 83

  50. 2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 84

  51. 2. Implement & debug Summary: how to implement & debug Steps Summary • Step through in debugger & watch out Get your model to for shape, casting, and OOM errors a run • Look for corrupted data, over- Overfit a single regularization, broadcasting errors b batch • Keep iterating until model performs Compare to a c up to expectations known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 85

  52. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 86

  53. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 87

  54. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 88

  55. 3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 89

  56. 3. Evaluate Bias-variance decomposition Breakdown of test error by source 40 2 34 35 5 32 30 2 27 25 25 20 15 10 5 0 r r r r g o o o o n s r r r r , i r r r r a . t e e e e e t i i b . f e n l t ) i r a g ( s e l e i b V a e e n v l i b r c T i ) o c t T g a n t u i n t d a f d e i r i i t e r s e o t a i d r v f V l r r a A n I e V u v o , . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 90

  57. 3. Evaluate Bias-variance decomposition Test error = irreducible error + bias + variance + val overfitting This assumes train, val, and test all come from the same distribution. What if not? Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 91

  58. 3. Evaluate Handling distribution shift Train data Test data Use two val sets: one sampled from training distribution and one from test distribution Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 92

  59. 3. Evaluate The bias-variance tradeoff Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 93

  60. 3. Evaluate Bias-variance with distribution shift Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 94

  61. 3. Evaluate Bias-variance with distribution shift Breakdown of test error by source 40 2 34 35 3 32 2 29 30 2 27 25 25 20 15 10 5 0 r r e r t r g r o f o o o o c n i s h r r r r r ) n i r r r a g r r s t e e e e e a t i n i b i n f e n r l l t i r t a a s a o e e l t i b a v v e V i i v l f t b r i T r u o c n t T e a s b u i d a l e d i a d r n r T i V t T e o u s r v i r D , A I . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 95

  62. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train - goal = 19% 
 (under-fitting) Train error 20% Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 96

  63. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Val - train = 7% 
 (over-fitting) Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 97

  64. 3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Validation error 27% Test - val = 1% 
 0 (no pedestrian) 1 (yes pedestrian) (looks good!) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 98

  65. 3. Evaluate Summary: evaluating model performance Test error = irreducible error + bias + variance 
 + distribution shift + val overfitting Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 99

  66. Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 100

Recommend


More recommend