asynchronous hyperparameter tuning and ablation studies
play

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache - PowerPoint PPT Presentation

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami Distributed Computing Group, KTH Royal Institute of Technology @cutlash CASTOR Software Days 2019 October 16 2019 sinash@kth.se The Machine


  1. Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami Distributed Computing Group, KTH Royal Institute of Technology @cutlash CASTOR Software Days 2019 October 16 2019 sinash@kth.se

  2. The Machine Learning System Repeat if needed Machine Learning Dataset Model Problem Definition Data Preparation Evaluate Model Selection Optimizer Model Training October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 2

  3. Artificial Neural Networks Output Layer Input Layer Hidden Layer October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 3

  4. How We Study the Brain • Early 19 th Century, ablative brain surgeries by Jean Pierre Flourens (1794 - 1867) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 4

  5. Ablation for Machine Learning? floors area rooms price Repeat if needed Machine Learning Dataset Model Problem Definition Data Preparation Evaluate Model Selection Optimizer Model Training October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 5

  6. Talk of the Town “Too frequently, authors propose many tweaks absent proper ablation studies … Sometimes just one of the changes is actually responsible for the improved results … this practice misleads readers to believe that all of the proposed changes are necessary.” (Lipton & Steinhardt, “ Troubling Trends in Machine Learning Scholarship ”) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 6

  7. Example: Layer Ablation (1/6) Accuracy: 78% The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 7

  8. Example: Layer Ablation (2/6) Accuracy: 73% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 8

  9. Example: Layer Ablation (3/6) The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 9

  10. Example: Layer Ablation (4/6) Accuracy: 67% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 10

  11. Example: Layer Ablation (5/6) The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 11

  12. Example: Layer Ablation (6/6) Accuracy: 63% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 12

  13. Ablation Study Evaluate Machine Learning Ablation System New Dataset / Model Configuration October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 13

  14. Hyperparameter Tuning Evaluate Machine Learning Hyperparameter System Tuner New Hyperparameter Values October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 14

  15. System Experimentation (Search) Evaluate Global Machine Learning Experiment System Controller New Trial October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 15

  16. Better Parallel • Ability to train better models, faster • Ability to modify and inspect, easier (“Parallel Training” - by Maxim Melnikov) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 18

  17. Parallelization in Practice Machine Learning Parallel Deep Learning Processing (TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 19

  18. Hopsworks Open-source Platform for Data-intensive AI October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 20

  19. Hopsworks Open-source Platform for Data-intensive AI What is Hopsworks? https://tinyurl.com/y4ze79d4 October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 21

  20. ML/DL in Hopsworks Machine Learning Experiments Data Pipelines Feature Data Parallel Model Serving Hyperparameter Ablation Ingest & Prep Store Training Tuning Studies Bottleneck, due to iterative nature • human interaction • October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 22

  21. Spark and Bulk Synchronous Parallel Model HDFS Metrics 3 Metrics 2 Metrics 1 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 23

  22. Example: Synchronous Hyperparameter Search HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Wasted Compute Wasted Compute Wasted Compute Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 24

  23. Critical Requirements • Parallel execution of trials • Support for early stopping of trials • Support for global control of the experiment • Resilience to stragglers • Simple, “Unified” User & Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 25

  24. Maggy An Open-source Framework for Asynchronous Computation on top of Apache Spark October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 26

  25. Key Idea: Long Running Tasks Task 11 Task 12 Barrier Task 13 Metrics … New Trial Task 1N Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 27

  26. Maggy Core Architecture October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 28

  27. Back to Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 29

  28. LOCO: Leave One Component Out • A simple, “natural” ablation policy: an implementation of an ablator • Currently supports Feature Ablation + Layer Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 30

  29. Feature Ablation • Uses the Feature Store to access the dataset metadata • Generates Python callables that once called, will return modified datasets • Removes one-feature-at-a-time floors price floors rooms price area rooms October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 31

  30. Layer Ablation • Uses a base model function • Generates Python callables that once called, will return modified models • Uses the model configuration to find and remove layer(s) • Removes one-layer-at-a-time (or one-layer-group-at-a-time) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 32

  31. (Example Notebook Available!) Ablation User & Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 33

  32. User API: Initialize the Study and Add Features October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 34

  33. User API: Define Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 35

  34. User API: Setup Model Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 36

  35. User API: Wrap the Training Function October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 37

  36. User API: Lagom! October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 38

  37. Developer API: Policy Implementation (1/2) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 39

  38. Developer API: Policy Implementation (2/2) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 40

  39. Hyperparameter Tuning: User API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 41

  40. Hyperparameter Tuning: Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 42

  41. Maggy is Open-source • Code Repository: https://github.com/logicalclocks/maggy • API Documentation: https://maggy.readthedocs.io/en/latest/ October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 43

  42. Next Steps • More Ablators • More Tuners • Support for More Frameworks October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 44

  43. Thanks to the entire Logical Clocks Team J Specially: Thank you! J Moritz Meister @morimeister Jim Dowling @jim_dowling Robin Andersson @robzor92 Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan @logicalclocks @hopsworks GitHub (Example Notebook Available!) https://github.com/hopshadoop/maggy https://maggy.readthedocs.io/en/latest/ https://logicalclocks.com/whitepapers/ @cutlash sinash@kth.se October 16 2019 CASTOR Software Days 2019

Recommend


More recommend