DSC 102 Systems for Scalable Analytics Arun Kumar Topic 7: ML Deployment Not included for Final Exam Slide Content ACKs: Alkis Polyzotis, Manasi Vartak 1
The Lifecycle of ML-based Analytics Feature Engineering Data acquisition Model Serving Training & Inference Data preparation Monitoring Model Selection 2
Deployment Stage of Data Science ❖ Data science does not exist in a vacuum. It must interplay with the data-generating process and prediction application ❖ Deploy Stage: Integrate the trained prediction function(s) with production environment, e.g., offline inference in a data system, online inference on a Web platform / IoT / etc. ❖ Typically, data scientist must work with “DevOps” engineers or “MLOps” engineers to achieve this 3
ML in Academia vs Production What you classes on statistics, ML, AI, etc. cover! ☺ https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf 4
Deployment Stage of Data Science ❖ Deployment stage typically involves 5 main activities in sync with other stages: 1. Packaging and Orchestration 2. Prediction Serving 3. Data Validation 4. Prediction Monitoring 5. Versioning 5
1. Packaging and Orchestration ❖ Basic Goal: Bundle up software to deploy with its dependencies into a lightweight standalone executable software that can run almost seamlessly across different OSs and hardware environments ❖ Most common approach today: Containerization ❖ Not specific to ML deployment but highly general ❖ Older generation approach called “virtual machines” included OS too and were bulky and slow ❖ Docker and Kubernetes are most popular options today 6
1. Packaging and Orchestration 7 https://medium.com/edureka/kubernetes-vs-docker-45231abeeaf1
1. Packaging and Orchestration ❖ Often, one might need to deploy end-to-end pipelines with effectively independent contrainerized software modules ❖ Workflow orchestration tools help handle complex pipelines ❖ Can specify time constraints, operation constraints, etc. 8
1. Packaging and Orchestration ❖ Cloud providers are also starting to make it easier to package and deploy prediction software, e.g., Model Endpoint in AWS Sagemaker ❖ Data scientists must look out for organization’s tools and services 9
2. Prediction Serving ❖ Basic Goal: Make ML inference fast and potential co- optimize with serving environment/infra. ❖ Typically automated tools; so data scientists only needs to know what systems available and how to use them ❖ 3 main kinds of systems: ❖ Program optimization of prediction function to improve hardware utilization, e.g., ONNX Runtime or Apache TVM ❖ Batch optimization of many concurrent prediction requests to balance latency and throughput better to improve hardware utilization, e.g., AWS SageMaker ❖ New hardware optimized for inference, e.g., TPUs 10
3. Data Validation ❖ Basic Goal: Ensure the data fed into prediction function conforms to its expectations on, say, schema/syntax/shape, integrity constraints (e.g., value ranges or domains), etc. ❖ Needs to be in lock step with data sourcing stage: acquiring, re-organizing, cleaning, and feature extraction ❖ Industry is starting to build platforms to make this process more rigorous and reusable, e.g., TensorFlow Extended ❖ Data scientists must learn their organization’s data validation practices and tools/APIs ❖ Also covered in Alkis’s guest lecture; further reading: https:// mlsys.org/Conferences/2019/doc/2019/167.pdf 11
4. Prediction Monitoring ❖ Basic Goal: Ensure the prediction functions are working as intended by data scientist; “silent failures” can happen due to concept drifts , i.e., data distribution has deviated significantly from when prediction function was built! ❖ Example: Sudden world event changes Web user behavior drastically, e.g., WHO declares pandemic! ☺ ❖ Needs to be in lock step with model building stage ❖ Industry today uses ad hoc statistical approaches ❖ Data scientists must look out for organizations’ monitoring practices, since it affects the lifecycle loop frequency ❖ Also covered in Alkis’s guest lecture; further reading: https:// mlsys.org/Conferences/2019/doc/2019/167.pdf 12
5. Versioning ❖ Basic Goal: Just like regular code, prediction software must be versioned and tracked for teams to ensure consistency across time and employees, as well as for auditing sake, ability to “rollback” to a safer state, etc. ❖ But unlike regular code, prediction software has 3 more dependencies other than just code: datasets (train/val/test), configuration (e.g., hyper-parameters), and environment (hardware/software, since that can affect accuracy too) ❖ Research and industry are barely starting to figure this out ❖ Data scientists must look out versioning best practices/tools ❖ Covered in Manasi’s guest lecture; https://blog.verta.ai/blog/ how-to-move-fast-in-ai-without-breaking-things 13
Recommend
More recommend