Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh - PowerPoint PPT Presentation

Strategy for DL troubleshooting Tune hyper- parameters Implement Meets re- Start simple Evaluate & debug quirements Improve model/data Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 35

1. Start simple Starting simple Steps Choose a simple a architecture b Use sensible defaults c Normalize inputs d Simplify the problem Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 36

1. Start simple Demystifying architecture selection Start here Consider using this later Images LeNet-like architecture ResNet Images LSTM with one hidden Attention model or Sequences layer (or temporal convs) WaveNet-like model Fully connected neural net Problem-dependent Other with one hidden layer Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 37

1. Start simple Dealing with multiple input modalities Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 38

1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space Input 1 Input 2 “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 39

1. Start simple Dealing with multiple input modalities 1. Map each into a lower dimensional feature space ConvNet Flatten (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 40

1. Start simple Dealing with multiple input modalities 2. Concatenate ConvNet Flatten Con“cat” (64-dim) Input 1 (72-dim) Input 2 LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) “cat” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 41

1. Start simple Dealing with multiple input modalities 3. Pass through fully connected layers to output ConvNet Flatten FC FC Output Concat (64-dim) Input 1 (72-dim) Input 2 T/F LSTM (48-dim) “This” “is” Input 3 “a” (184-dim) (128-dim) “cat” (256-dim) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 42

1. Start simple Recommended network / optimizer defaults • Optimizer: Adam optimizer with learning rate 3e-4 • Activations: relu (FC and Conv models), tanh (LSTMs) • Initialization: He et al. normal (relu), Glorot normal (tanh) • Regularization: None • Data normalization : None Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 44

<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">ACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE65jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUA5N0dsiPBYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>       <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">ACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwCVQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> 1. Start simple Definitions of recommended initializers • (n is the number of inputs, m is the number of outputs) • He et al. normal (used for ReLU)   ! r 2 N 0 , n • Glorot normal (used for tanh) ! r 2 N 0 , n + m Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 45

1. Start simple Important to normalize scale of input data • Subtract mean and divide by variance • For images, fine to scale values to [0, 1] or [-0.5, 0.5]   (e.g., by dividing by 255)   [Careful, make sure your library doesn’t do it for you!] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 47

1. Start simple Consider simplifying the problem as well • Start with a small training set (~10,000 examples) • Use a fixed number of objects, classes, image size, etc. • Create a simpler synthetic training set Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 49

1. Start simple Simplest model for pedestrian detection Running example • Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test • Use a LeNet architecture with sigmoid cross-entropy loss • Adam optimizer with LR 3e-4 • No regularization 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 50

1. Start simple Starting simple Steps Summary • LeNet, LSTM, or Fully Choose a simple Connected a architecture • Adam optimizer & no regularization b Use sensible defaults • Subtract mean and divide by std, or just divide by c Normalize inputs 255 (ims) • Start with a simpler version of your problem d Simplify the problem (e.g., smaller dataset) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 51

2. Implement & debug Implementing bug-free DL models Steps Get your model to a run Overfit a single b batch Compare to a c known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 53

2. Implement & debug Preview: the five most common DL bugs • Incorrect shapes for your tensors   Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None) • Pre-processing inputs incorrectly   E.g., Forgetting to normalize, or too much pre-processing • Incorrect input to your loss function   E.g., softmaxed outputs to a loss that expects logits • Forgot to set up train mode for the net correctly   E.g., toggling train/eval, controlling batch norm dependencies • Numerical instability - inf/NaN   Often stems from using an exp, log, or div operation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 54

2. Implement & debug General advice for implementing your model Lightweight implementation • Minimum possible new lines of code for v1 • Rule of thumb: <200 lines • (Tested infrastructure components are fine) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 55

2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…)   • Rule of thumb: <200 lines instead of   tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…)   instead of writing out the exp Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 56

2. Implement & debug General advice for implementing your model Lightweight implementation Use o ff -the-shelf components, e.g., • • Minimum possible new lines of code Keras for v1 • tf.layers.dense(…)   • Rule of thumb: <200 lines instead of   tf.nn.relu(tf.matmul(W, x)) • (Tested infrastructure components are fine) • tf.losses.cross_entropy(…)   instead of writing out the exp Build complicated data pipelines later • Start with a dataset you can load into memory Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 57

2. Implement & debug Implementing bug-free DL models Common Recommended resolution issues Get your model to a run Shape mismatch Step through model creation and inference in a debugger Casting issue Scale back memory intensive OOM operations one-by-one Standard debugging toolkit (Stack Other Overflow + interactive debugger) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 59

  2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier   Option 1: step through graph creation Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 61

  2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier   Option 2: step into training loop Evaluate tensors using sess.run(…) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 62

  2. Implement & debug Debuggers for DL code • Pytorch: easy, use ipdb • tensorflow: trickier   Option 3: use tfdb python -m tensorflow.python.debug.examples.debug_mnist --debug Stops execution at each sess.run(…) and lets you inspect Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 63

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Shape • Confusing tensor.shape, tf.shape(tensor), mismatch Undefined tensor.get_shape() shapes • Reshaping things to a shape of type Tensor (e.g., when loading data from a file) • Flipped dimensions when using tf.reshape(…) Incorrect • Took sum, average, or softmax over wrong shapes dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if shape is (None, 1, 1, 4) • Data stored on disk in a di ff erent dtype than loaded (e.g., stored a float64 numpy array, and loaded it as a float32) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 65

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Casting issue • Forgot to cast images from uint8 to float32 Data not in • Generated data using numpy in float64, forgot to float32 cast to float32 Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 66

2. Implement & debug Implementing bug-free DL models Common Most common causes issues • Too large a batch size for your model (e.g., OOM Too big a during evaluation) tensor • Too large fully connected layers • Loading too large a dataset into memory, rather Too much than using an input queue data • Allocating too large a bu ff er for dataset creation • Memory leak due to creating multiple models in Duplicating the same session operations • Repeatedly creating an operation (e.g., in a function that gets called over and over again) • Other processes running on your GPU Other processes Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 67

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Other common • Forgot to initialize variables errors Other bugs • Forgot to turn o ff bias when using batch norm • “Fetch argument has invalid type” - usually you overwrote one of your ops with an output during training Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 68

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 70

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension Error explodes Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 71

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high Error oscillates Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 72

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes • Data or labels corrupted (e.g., zeroed, incorrectly shu ffl ed, or preprocessed incorrectly) Error oscillates • Learning rate too high Error plateaus Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 73

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b batch Error goes up Error explodes Error oscillates • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits, accidentally add ReLU on output) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 74

2. Implement & debug Implementing bug-free DL models Common Most common causes issues Overfit a single b • Flipped the sign of the loss function / gradient batch • Learning rate too high Error goes up • Softmax taken over wrong dimension • Numerical issue. Check all exp, log, and div operations Error explodes • Learning rate too high • Data or labels corrupted (e.g., zeroed or incorrectly shu ffl ed) Error oscillates • Learning rate too high • Learning rate too low • Gradients not flowing through the whole model • Too much regularization Error plateaus • Incorrect input to loss function (e.g., softmax instead of logits) • Data or labels corrupted Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 75

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful You can:   • Walk through code line-by-line and ensure you have the same output • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 77

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can:   • Walk through code line-by-line and ensure you have the same output Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 78

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can:   • Results from your model on a benchmark dataset (e.g., • Same as before, but with lower MNIST) confidence • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 79

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from a paper (with no code) You can:   • Ensure your performance is up to par with expectations Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 80

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar More dataset to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) You can:   • Uno ffi cial model implementation • Make sure your model performs well in a simpler setting • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) Less useful Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 81

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation You can:   • Results from the paper (with no code) • Get a general sense of what kind of performance can be expected • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 82

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) You can:   • Results from your model on a benchmark dataset (e.g., • Make sure your model is learning MNIST) anything at all • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 83

2. Implement & debug Hierarchy of known results • O ffi cial model implementation evaluated on similar dataset More to yours useful • O ffi cial model implementation evaluated on benchmark (e.g., MNIST) • Uno ffi cial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset Less • Super simple baselines (e.g., average of outputs or linear useful regression) Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 84

2. Implement & debug Summary: how to implement & debug Steps Summary • Step through in debugger & watch out Get your model to for shape, casting, and OOM errors a run • Look for corrupted data, over- Overfit a single regularization, broadcasting errors b batch • Keep iterating until model performs Compare to a c up to expectations known result Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 85

3. Evaluate Bias-variance decomposition Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 87

3. Evaluate Bias-variance decomposition Breakdown of test error by source 40 2 34 35 5 32 30 2 27 25 25 20 15 10 5 0 r r r r g o o o o n s r r r r , i r r r r a . t e e e e e t i i b . f e n l t ) i r a g ( s e l e i b V a e e n v l i b r c T i ) o c t T g a n t u i n t d a f d e i r i i t e r s e o t a i d r v f V l r r a A n I e V u v o , . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 90

3. Evaluate Bias-variance decomposition Test error = irreducible error + bias + variance + val overfitting This assumes train, val, and test all come from the same distribution. What if not? Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 91

3. Evaluate Handling distribution shift Train data Test data Use two val sets: one sampled from training distribution and one from test distribution Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 92

3. Evaluate The bias-variance tradeoff Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 93

3. Evaluate Bias-variance with distribution shift Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 94

3. Evaluate Bias-variance with distribution shift Breakdown of test error by source 40 2 34 35 3 32 2 29 30 2 27 25 25 20 15 10 5 0 r r e r t r g r o f o o o o c n i s h r r r r r ) n i r r r a g r r s t e e e e e a t i n i b i n f e n r l l t i r t a a s a o e e l t i b a v v e V i i v l f t b r i T r u o c n t T e a s b u i d a l e d i a d r n r T i V t T e o u s r v i r D , A I . e . i ( Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 95

3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train - goal = 19%   (under-fitting) Train error 20% Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 96

3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Val - train = 7%   (over-fitting) Validation error 27% 0 (no pedestrian) 1 (yes pedestrian) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 97

3. Evaluate Train, val, and test error for pedestrian detection Running example Error source Value Goal 1% performance Train error 20% Validation error 27% Test - val = 1%   0 (no pedestrian) 1 (yes pedestrian) (looks good!) Test error 28% Goal: 99% classification accuracy Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 98

3. Evaluate Summary: evaluating model performance Test error = irreducible error + bias + variance   + distribution shift + val overfitting Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L6: Troubleshooting � 99

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh - PowerPoint PPT Presentation

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh Tobin, Sergey Karayev, Pieter Abbeel Lifecycle of a ML project Cross-project Per-project infrastructure activities Planning & Team & hiring project setup Data

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

full year results full year results full year results full full year results full year results full

A. Job Title: Junior Full Stack Developer The Junior Full Stack Developer will be advised by the

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Compilers Stack Machines Alex Aiken Stack Machines Only storage is a stack An

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

Full Stack Deep Learning Lecture 1: Intro Pieter Abbeel, Sergey Karayev, Josh Tobin Organizer

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Re-arquitetando o Re-arquitetando o Stack Overflow Stack Overflow ou como construmos o Stack

Stack machines (Using slides adapted from the book) Stacks A stack machine maintains an

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and

Off Earth Mining under the Outer Space Treaty: Legal with Future Challenges 1. Current National

Automated Reasoning in First-Order Logic Peter Baumgartner

<this slide is intentionally blank> idealistic realistic state of the art state of the

Software Management Roland Kammerer Institute of Computer Engineering Vienna University of

The Virtual School Designated Teacher Briefings Autumn 2017 Felicity Evans Virtual School Head

The Starry Messenger (I) Galileo Galilei (1564-1642) Galileo was a mathematics professor from

02/01/2019 New Yorker cartoon 1960 by Robert J Day View of possible site of Nineveh Hanging

Sambuz

Useful Links

Newsletter

Mail Us

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh - PowerPoint PPT Presentation

Full Stack Deep Learning Troubleshooting Deep Neural Networks Josh Tobin, Sergey Karayev, Pieter Abbeel Lifecycle of a ML project Cross-project Per-project infrastructure activities Planning & Team & hiring project setup Data

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

full year results full year results full year results full full year results full year results full

A. Job Title: Junior Full Stack Developer The Junior Full Stack Developer will be advised by the

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Compilers Stack Machines Alex Aiken Stack Machines Only storage is a stack An

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

Full Stack Deep Learning Lecture 1: Intro Pieter Abbeel, Sergey Karayev, Josh Tobin Organizer

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Re-arquitetando o Re-arquitetando o Stack Overflow Stack Overflow ou como construmos o Stack

Stack machines (Using slides adapted from the book) Stacks A stack machine maintains an

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and

Off Earth Mining under the Outer Space Treaty: Legal with Future Challenges 1. Current National

Automated Reasoning in First-Order Logic Peter Baumgartner

&lt;this slide is intentionally blank&gt; idealistic realistic state of the art state of the

Software Management Roland Kammerer Institute of Computer Engineering Vienna University of

The Virtual School Designated Teacher Briefings Autumn 2017 Felicity Evans Virtual School Head

The Starry Messenger (I) Galileo Galilei (1564-1642) Galileo was a mathematics professor from

02/01/2019 New Yorker cartoon 1960 by Robert J Day View of possible site of Nineveh Hanging

Sambuz

Useful Links

Newsletter

Mail Us

<this slide is intentionally blank> idealistic realistic state of the art state of the