Michelle Casbon QCon São Paulo May 9, 2018 Architecture of an NLP Deployment
whoami � 2 @texasmichelle
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 3 @texasmichelle
Is this a No clearly Move along defined problem? ML Yes decision Can it be Yes solved in a tree Do that deterministic way? No Dive in Source: David Andrzejewski @davidandrzej � 4 @texasmichelle
MACHINE LEARNING Counting things is still really hard. � 5 @texasmichelle
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 6 @texasmichelle
Evolution of NLP architectures Hand-crafted, Purpose-built artisan systems In the beginning Yesteryear Today Tomorrow Cobbled-together with tools from Future other domains � 7 @texasmichelle
Application examples @texasmichelle
Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle
Data Warehouse Analytics Layer OLTP NLP ETL OLAP Microservice OLTP OLTP Model Store @texasmichelle
Data Pipeline NLP Formatter Lookup OLTP Source Microservice Model Store @texasmichelle
NLP microservice(s) Training data preparation Data Data ingestion Data validation Data analysis transformation Training Data Featurization Model building segmentation Serving Model retrieval Featurization Prediction REST server Cross-validation Data Featurization Prediction Evaluation segmentation
Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 14 @texasmichelle
Perception Data Resource Serving UI Collection Management Infrastructure Process Management ML Code Data Verification Configuration Feature Application Monitoring Analysis Tools Extraction Logic � 15 @texasmichelle
Reality Data Resource Serving UI Collection Management Infrastructure Process Configuration Monitoring ML Code Management Data Feature Analysis Application Verification Extraction Tools Logic � 16 @texasmichelle
Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Configuration Extraction Infrastructure Process Data Exploration Model Validation Business Logic Management Data Model Resource UI Transformation Versioning Management Data Validation Model Auditing Load Balancing Monitoring Data Analysis Distributed Logging Training Training Data Continuous Continuous Segmentation Training Delivery Authentication/ Authorization � 17 @texasmichelle
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 18 @texasmichelle
Guiding principles Robustness Resiliency High availability Autoscaling Fault-tolerant Constrained resource Versioning Continuous delivery consumption Models Optimize for person- Per microservice hours Data Hyperparameters System config @texasmichelle
Guiding principles Everything in one Everything in Automation Empowerment place source control Tests Deployment If you don't need to Take it with you Store everything Communicate progress manage it yourself, Positive & negative don't training data Goals Add feedback to the UI Measure Logging Traction Monitoring Transparency @texasmichelle
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 21 @texasmichelle
Duolingo legacy Source: Rewriting Duolingo's engine in Scala @texasmichelle
Duolingo today Source: Rewriting Duolingo's engine in Scala • Redesigned architecture • Refactored code from Python to Scala • Latency dropped from 750ms to 14ms • Engine uptime increased from 99.9% to 100% @texasmichelle
architecture Frontend Orchestration Layer Microservice NLP Microservice NLP Microservice Microservice Microservice � 24 @texasmichelle
Qordoba on GCP � 25 @texasmichelle
Qordoba � 26 @texasmichelle
@texasmichelle
GitOps Optimizes for person-hours ● Empowers engineers & data scientists ● Cluster state is always recoverable, with a historical record ● Create a new feature PR review Create a new feature Deployment Create a new PR review Verify feature feature Deployment PR review Create a new Verify feature feature Deployment Create a new PR review Verify feature feature Deployment PR review @texasmichelle Verify feature Deploymen
Kubeflow Who What Why Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers 0.1 release alone Software engineers Product managers https://github.com/kubeflow/kubeflow @texasmichelle
Make it easy for everyone to develop, deploy, & manage portable, scalable ML everywhere Composability Portability Scalability Full product Support lifecycle specialized Single, unified tool Entire stack Native to k8s hardware, like for common GPUs Reduce variability processes between services & Reduce costs environments Improve model performance � 30
Kubeflow Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML frameworks platform for ML components together Microservices Tensorflow Run wherever k8s runs Ksonnet Manage infra Pytorch Use k8s to manage ML declaratively Move between local -> tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al. � 31
E2E Example GitHub Issue Summarization ● How to summarize text and generate features from GitHub Issues using deep learning with Keras and ○ TensorFlow https://github.com/kubeflow/examples/tree/master/github_issue_summarization Kubeflow installation with ksonnet ● Persistent disk usage ● Jupyterhub ● Source: Hamel Husain @texasmichelle
Exploration/experimentation Choose a dataset ● Slice and dice ● Try out various means of featurization ● ● Train a number of models & compare Plot various statistics along the way ● Jupyterhub on k8s ● Security ○ Reproducibility ○ Resource allocation ○ Scale beyond a laptop ○ Centralized storage ○ @texasmichelle
E2E Example Scaling featurization and training ● TFJob ○ ○ tensor2tensor Model deployment with SeldonIO ● Accessing via a simple web app ● Teardown ● @texasmichelle
Try it yourself GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● Katacoda: https://www.katacoda.com/kubeflow ● http://gh-demo.kubeflow.org/ ● Special thanks Jeremy Lewi Ankush Agarwal @texasmichelle
Just the beginning Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, ● libraries, and example models You tell us! Get involved ● github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ kubeflow-discuss@googlegroups.com ○
Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 37 @texasmichelle
OK Google, build me a classifier. Future Michelle @texasmichelle
g.co/next18 July 24-27, 2018 San Francisco
Recommend
More recommend