improving reproducible deep learning
play

Improving Reproducible Deep Learning Workflows with DeepDIVA M. - PowerPoint PPT Presentation

Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1* , L. Vgtlin 1 , M. Wrsch 12 , R. Ingold 1 , M. Liwicki 13 *Equal contribution 1 DIVA Group, University of Fribourg, Switzerland 2 IIT, FHNW


  1. Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1* , L. Vögtlin 1 , M. Würsch 12 , R. Ingold 1 , M. Liwicki 13 *Equal contribution 1 DIVA Group, University of Fribourg, Switzerland 2 IIT, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland 3 EISLAB Machine Learning, Luleå University of Technology, Sweden

  2. Reproducibility Crisis: Trust or Verify? Joelle Pineau , “ Reproducible, Reusable, and Robust Reinforcement Learning ”, invited talk @NeurIPS 2018, Montreal, Canada 2

  3. Why Is This a Problem? No possibility to verify No possibility to extend Lots of overhead created Leads to no trust in scientific results 3

  4. How To Make Steps Forward? Ensure reproducibility Of your own experiments Of other people’s experiments Promote open-source code Make it easy to have “good enough” code Enable code trustworthiness 4

  5. How We Contribute: DeepDIVA Open-Source Python framework Built on top of PyTorch Makes your life easer for: Reproducing your own and other people’s experiments Provides boilerplate code for: Common deep learning scenarios Handling time consuming everyday problems Documentation & Tutorial available 5

  6. Reproducing Your Own Experiments Short-term, or work in progress Long-term, or finished work 6

  7. Short-term Reproducibility Dangers Kilometres of poor or incomplete log files Stochasticity in the process 7

  8. How DeepDIVA Ensures Short-term Reproducibility Meaningful logging Saving all run parameters and command line args Providing concise coloured logs Deterministic runs Seeding the pseudo-random numbers generators: Python, Numpy and PyTorch. Disabling CuDNN (NVIDIA Deep Neural Network library) when necessary 8

  9. Long-term Reproducibility Dangers Poor (or non-existent!) use of version control Hard-to-die bad programming habits Silent data modifications 9

  10. How DeepDIVA Ensures Long-term Reproducibility Git status Linking every run to a specific commit in Git Allowing this feature to be disabled for dev purposes Copy code Copying the entire running code in the output folder Data Integrity Management Footprint of the data in a JSON file using SHA-1 hashes 10

  11. Reproducing Other People’s Experiments Given a paper, try to replicate the results and observations 11

  12. Reproducing Other People’s Experiments In order to reproduce an experiment one needs: Git repository URL Git commit identifier (full SHA) List of command line arguments used The data 12

  13. Productivity Out-of of-the-box Making your life easier: do not reinvent the wheel! 13

  14. “One click away” Deep Learning Scenarios 14

  15. Prepare Your Data “when the data is ready the task is solved” Download a dataset with a click Natural images, medical images, historical documents, … Split your dataset Train, Validation and Test splits Analyse the data Mean/std and class distributions Ensure data integrity Compare the footprints 15

  16. Real-time Visualizations Confusion Matrix Weight Histograms Tensorboard (from TensorFlow) Features Visualization Performance Evaluation 16

  17. Automatic Hyper-Parameter Optimization Let machine learning find the best values No expensive grid or random search 17

  18. Be A Part Of f It It Getting Started With DeepDIVA 18

  19. How To Use It No Setup Time From source on Ubuntu (or other flavours of Linux) Docker Image Coming Soon Documentation Online and in the code Tutorials Learn new features efficiently Fork It Extensive and modular for easy modifications 19

  20. Make Your Experiment Reproducible bit.ly/DeepDIVA 20

Recommend


More recommend