modeldb a system for managing ml models
play

ModelDB : a system for managing ML models Manasi Vartak , PhD - PowerPoint PPT Presentation

ModelDB : a system for managing ML models Manasi Vartak , PhD Candidate MIT Database Group mvartak@csail.mit.edu | @DataCereal Why Model Management? IMDB Prediction Task Given data about movies (e.g. year made, studio, genres, actors)


  1. ModelDB : a system for managing ML models Manasi Vartak , PhD Candidate MIT Database Group mvartak@csail.mit.edu | @DataCereal

  2. Why Model Management?

  3. IMDB Prediction Task • Given data about movies (e.g. year made, studio, genres, actors) • Predict IMDB_score

  4. Model 1 LinearRegression Accuracy: 62%

  5. Model 2 Accuracy: 68% CrossValidation

  6. Model 3 RandomForest Accuracy: 75% CrossValidation

  7. Model 4 FeatureEngg RandomForest Accuracy: 80% CrossValidation

  8. Model 50 GBDT FeatureEngg Accuracy: 84% CrossValidation

  9. Why is this a problem? Did my colleague do that • No record of experiments already? How did normalization • Insights lost along the way affect my ROC? What params did I use? • Difficult to reproduce results Where is the prod • Cannot search for or query models version of the model for churn? • Difficult to collaborate How does someone review your model?

  10. ModelDB: an end-to-end model management system Query Ingest models, Store and version metadata modeling artifacts Collaborate, Reproduce results

  11. ModelDB Architecture Scala spark.ml ModelDB Backend thrift ModelDB Python Frontend: vis + query scikit-learn Storage … Events Light Client

  12. Demo

  13. ML Infrastructure • DBMSs Data • Spark + A/B testing Processing • Hive + Model Retraining • CSV Custom • Spark.ml • Custom • sklearn Model Model Serving • TF-serving Management • R Training • Clipper • DL frmks • H2O + Visualizations + Interpretability Monitoring Custom + Debugging

  14. Benefits of model management Offline Online Developer Model Monitoring Productivity + Provenance + Model performance over time + Reproducibility + Anomaly detection + Meta-analyses + Trigger retraining Increased Fast Failure Transparency Analyses + What models have been built + How was this model built? + How well do models work? + What has changed? + Auditability

  15. At last NIPS • Initial version of ModelDB with sklearn, spark.ml support • Early adopters (banks, financial firms), early feedback • Focus on developer productivity

  16. Since last NIPS! • Initial release of ModelDB in Feb early 2017 • Adoption/evaluation at Adobe, banks, financial institutions, and tech companies • Won AIGrant for open-source projects • See papers at SIGMOD, NIPS workshops

  17. Since last NIPS! • Easy installation: docker, pip • In the (research) pipeline • Light clients (R, YAML, • Data and intermediate packages outside of sklearn) storage • Flexible metadata storage • Model diagnosis • Collecting metrics over time • Fine-grained visualizations

  18. ModelDB so far • Incredible inbound interest • Banks, finance, insurance, tech • Lots of feature requests (e..g monitoring, diagnosis, DL). More than research resources can handle :) • Validation • Every data scientist building > 10 models needs model management and is looking for these tools • Vision: Industry standard tool for managing ML models and metadata

  19. Moving to Apache Incubation • With MIT, Adobe, other partners (*MLSys community) • Open development to wider community • Contributions across industry • Roadmap • Multiple storage backends, DL frameworks, R • Monitoring capabilities

  20. Call for Contributions! • Community over code • Build once, reuse many times • Why? • It will measurably improve your workflow • Pay it forward • Be part of larger open-source project

  21. How to Contribute • Test it out and give feedback • Share: teams, meetups, data science meetings, blogs • Documentation • Code: • Lots of issues on GitHub • Add support for your favorite ML frameworks

  22. Informal Meeting at MLSys • Interested in testing/adopting ModelDB? • Did you build such a system, can you share lessons? • Open-source Contributors! • How/when • Whova app (“Model Management Meetup”) • mvartak@csail.mit.edu • Poster

  23. People

  24. ModelDB https://github.com/mitdbg/modeldb http://modeldb.csail.mit.edu Manasi Vartak | @DataCereal

Recommend


More recommend