architecture of an nlp deployment whoami
play

Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - PowerPoint PPT Presentation

Michelle Casbon QCon So Paulo May 9, 2018 Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures 3 @texasmichelle


  1. Michelle Casbon QCon São Paulo 
 May 9, 2018 Architecture of an NLP Deployment

  2. whoami � 2 @texasmichelle

  3. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 3 @texasmichelle

  4. Is this a No clearly Move along defined problem? ML Yes decision Can it be Yes solved in a tree Do that deterministic way? No Dive in Source: David Andrzejewski @davidandrzej � 4 @texasmichelle

  5. MACHINE LEARNING Counting things is still really hard. � 5 @texasmichelle

  6. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 6 @texasmichelle

  7. Evolution of NLP architectures Hand-crafted, Purpose-built artisan systems In the beginning Yesteryear Today Tomorrow Cobbled-together with tools from Future other domains � 7 @texasmichelle

  8. Application examples @texasmichelle

  9. Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle

  10. Data Warehouse Analytics Layer OLTP NLP ETL OLAP Microservice OLTP OLTP Model Store @texasmichelle

  11. Data Pipeline NLP Formatter Lookup OLTP Source Microservice Model Store @texasmichelle

  12. NLP microservice(s) Training data preparation Data Data ingestion Data validation Data analysis transformation Training Data Featurization Model building segmentation Serving Model retrieval Featurization Prediction REST server Cross-validation Data Featurization Prediction Evaluation segmentation

  13. Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle

  14. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 14 @texasmichelle

  15. Perception Data Resource Serving UI Collection Management Infrastructure Process Management ML Code Data Verification Configuration Feature Application Monitoring Analysis Tools Extraction Logic � 15 @texasmichelle

  16. Reality Data Resource Serving UI Collection Management Infrastructure Process Configuration Monitoring ML Code Management Data Feature Analysis Application Verification Extraction Tools Logic � 16 @texasmichelle

  17. Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Configuration Extraction Infrastructure Process Data Exploration Model Validation Business Logic Management Data Model Resource UI Transformation Versioning Management Data Validation Model Auditing Load Balancing Monitoring Data Analysis Distributed Logging Training Training Data Continuous Continuous Segmentation Training Delivery Authentication/ Authorization � 17 @texasmichelle

  18. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 18 @texasmichelle

  19. Guiding principles Robustness Resiliency High availability Autoscaling Fault-tolerant Constrained resource Versioning Continuous delivery consumption Models Optimize for person- Per microservice hours Data Hyperparameters System config @texasmichelle

  20. Guiding principles Everything in one Everything in Automation Empowerment place source control Tests Deployment If you don't need to Take it with you Store everything Communicate progress manage it yourself, Positive & negative don't training data Goals Add feedback to the UI Measure Logging Traction Monitoring Transparency @texasmichelle

  21. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 21 @texasmichelle

  22. Duolingo legacy Source: Rewriting Duolingo's engine in Scala @texasmichelle

  23. Duolingo today Source: Rewriting Duolingo's engine in Scala • Redesigned architecture • Refactored code from Python to Scala • Latency dropped from 750ms to 14ms • Engine uptime increased from 99.9% to 100% @texasmichelle

  24. architecture Frontend Orchestration Layer Microservice NLP Microservice NLP Microservice Microservice Microservice � 24 @texasmichelle

  25. Qordoba on GCP � 25 @texasmichelle

  26. Qordoba � 26 @texasmichelle

  27. @texasmichelle

  28. GitOps Optimizes for person-hours ● Empowers engineers & data scientists ● Cluster state is always recoverable, with a historical record ● Create a new feature PR review Create a new feature Deployment Create a new PR review Verify feature feature Deployment PR review Create a new Verify feature feature Deployment Create a new PR review Verify feature feature Deployment PR review @texasmichelle Verify feature Deploymen

  29. Kubeflow Who What Why Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers 0.1 release alone Software engineers Product managers https://github.com/kubeflow/kubeflow @texasmichelle

  30. Make it easy for everyone to develop, deploy, & manage portable, scalable ML everywhere Composability Portability Scalability Full product Support lifecycle specialized Single, unified tool Entire stack Native to k8s hardware, like for common GPUs Reduce variability processes between services & Reduce costs environments Improve model performance � 30

  31. Kubeflow Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML frameworks platform for ML components together Microservices Tensorflow Run wherever k8s runs Ksonnet Manage infra Pytorch Use k8s to manage ML declaratively Move between local -> tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al. � 31

  32. E2E Example GitHub Issue Summarization ● How to summarize text and generate features from GitHub Issues using deep learning with Keras and ○ TensorFlow https://github.com/kubeflow/examples/tree/master/github_issue_summarization Kubeflow installation with ksonnet ● Persistent disk usage ● Jupyterhub ● Source: Hamel Husain @texasmichelle

  33. Exploration/experimentation Choose a dataset ● Slice and dice ● Try out various means of featurization ● ● Train a number of models & compare Plot various statistics along the way ● Jupyterhub on k8s ● Security ○ Reproducibility ○ Resource allocation ○ Scale beyond a laptop ○ Centralized storage ○ @texasmichelle

  34. E2E Example Scaling featurization and training ● TFJob ○ ○ tensor2tensor Model deployment with SeldonIO ● Accessing via a simple web app ● Teardown ● @texasmichelle

  35. Try it yourself GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● Katacoda: https://www.katacoda.com/kubeflow ● http://gh-demo.kubeflow.org/ ● Special thanks Jeremy Lewi Ankush Agarwal @texasmichelle

  36. Just the beginning Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, ● libraries, and example models You tell us! Get involved ● github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ kubeflow-discuss@googlegroups.com ○

  37. Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 37 @texasmichelle

  38. OK Google, build me a classifier. Future Michelle @texasmichelle

  39. g.co/next18 July 24-27, 2018 San Francisco

Recommend


More recommend