lessons learned deploying and monitoring ai models in
play

Lessons Learned Deploying and Monitoring AI Models in Production at - PowerPoint PPT Presentation

harish@datatron.com jerry@datatron.com Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies Who are we? Harish Doddi CEO Jerry Xu CTO datatron 2 Todays Enterprise AI life cycle Development:


  1. harish@datatron.com jerry@datatron.com Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies

  2. Who are we? Harish Doddi CEO Jerry Xu CTO datatron 2

  3. Today’s Enterprise AI life cycle Development: 
 Optimization Production Discovery “The Battleground” “The Battleground” “The Playground” datatron 3

  4. Lesson 1 You either NEVER deploy a model, or you have to do it over and over again datatron 4

  5. Are your models decaying? Deploy over and Deploy and Done over… Model decays over time Model replenishes Performance Performance Time Time Model performance Model performance Result Result decreases consistent datatron 5

  6. ML model cycle is a continuously optimizing process Model Building Concept drift New concept comes up … Model Model Monitoring Deployment Model Model Management Testing datatron 6

  7. Connecting Machine Learning to Software world Before Now Future Software deployment Machine Learning models will deploy every day very frequent 
 Software deployment and fast BUT once a 1 or 2 years Machine Learning models deploy very slow datatron 7

  8. Lesson 2 Models may go wrong, you need to monitor them datatron 8

  9. South Park and Alexa datatron 9

  10. Monitoring Learning: Post mortem is the only option Without Model Monitoring The problem The team detects the problem and occurs decides what to do With Model Monitoring The problem The team decides occurs what to do Notify asap datatron 10

  11. Monitoring for Machine Learning Models Model Performance monitoring • Confusion Matrix • Gain and Lift charts • Kolomogorov Smirnov chart • Area Under the ROC curve • Gini Coefficient • Concordant – Discordant ratio • Root Mean Squared Error (RMSE) • etc Model Timeout monitoring Infrastructure monitoring Organization KPI monitoring Deployment monitoring datatron 11

  12. Lesson 3 Your real work starts AFTER you deploy the model to production datatron 12

  13. Enterprise AI Life Cycle Exploration Training Deploy datatron 13

  14. Enterprise AI Life Cycle After Deployment Model Anomaly Deploy A/B Testing SLA Selection detection Blue Green Split traffic Monitor performance Model routing Feature distribution Deployment Shadowing Fall back strategy Challenger Model result Rollback Alerting KPI based selection Canary datatron 14

  15. Lesson 4 Data science is scarce resource, you need to make sure you organize well datatron 15

  16. Deployment Learning: Rise of new engineering role Machine Learning Deep Learning Engineer Engineer There is a hyper-competitive WAR FOR TALENT that is projected to get much worse datatron 16

  17. Teams face cross- End User functional Production Machine Data Science DevOps Engineering Model Learning inefficiencies BEST CASE SCENARIO: with a world-class team, 1 model deployed 
 • Teams operate in silos, don’t speak the same language per quarter • Errors due to lack of communication • Engineering has to write stand-alone scripts datatron 17

  18. Hidden Technical Debt in Machine Learning Systems Google Paper Hidden Technical Debt in Machine Learning Systems Boundary erosion Machine learning offers a fantastically powerful Entanglement toolkit for building complex systems quickly. … it Hidden feedback loops is remarkably easy to incur massive ongoing Undeclared consumers Data dependencies maintenance costs at the system level when Changes in the external world applying machine learning. System-level anti-patterns datatron 18

  19. Lesson 5 Be prepared, your number of models will increase datatron 19

  20. Deployment Learning: 1 model vs Multiple models datatron 20

  21. Cost per model increases significantly if no automation As the number of models increases, the cost also increases Cost per Cost per model model # of # of models models datatron 21

  22. Lesson 6 Senior people are needed AFTER deploying to production datatron 22

  23. Software Development vs ML Model Development Senior People Requirements Testing Evolution Design Implementation Senior People Monitoring Data Training / Deploy to Requirements and Preparation Testing Production Optimization datatron 23

  24. Lesson 7 Don’t be married to a single framework datatron 24

  25. Build/Bring Your Own Models, Frameworks, Languages datatron 25

  26. Thank you! Innovators Pavilion Booth P4 harish@datatron.com

Recommend


More recommend