S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS Sina Chavoshi
Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions
Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions
Building Blocks Sight Language Conversation
Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions
Solutions / Contact Center Google Cloud Contact Center AI Phone Backend Fulfillment Contact Center Contact Center Virtual Provider Interface Agent Customer Knowledge Base (PDF/HTML) Agent Chat Assist Agent Virtual Agent
Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions
Cloud AI Platform Data pipeline Model development Cloud Cloud ML BigQuery Dataprep Engine Model deployment and management Cloud Cloud Dataflow Dataproc Cloud ML Cloud Kubeflow Engine Kubernetes Engine Services Tools Community Jupyter ASL Notebooks
Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment.
In addition to the actual ML... ML Code
You have to worry about so much more. Data Monitoring Verification Configuration Data Collection Analysis Tools ML Code Serving Process Management Machine Infrastructure Tools Resource Feature Extraction Management Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
AI problems today Problems Solutions Deployment Brittle, opinionated infrastructure that is hard to 01 productionize and breaks between cloud and on-prem Talent 02 02 Machine Learning expertise is scarce Reusable pipelines Collaboration 03 03 Difficult to find, leverage existing solutions
01: Kubeflow ML microservices Scalable ML services on Kubernetes Easy to get started • Out-of-box support for top frameworks Training Predict – pytorch, caffe, tf and xgboost • Kubernetes manages dependencies, resources Cloud Swappable & scalable • Library of ML services • GPU support • Massive scale Training Predict Meet customer where they are • GCP On-prem • On-prem with Cisco
RAPIDS Product Overview
THE BIG PROBLEM IN DATA SCIENCE Manage Data Training Evaluate Deploy Data Model All Structured ETL Visualization Scoring Preparation Training Data Data Store Slow Training Times for Data Scientists
RAPIDS — OPEN GPU DATA SCIENCE Software Stack Python Data Preparation Model Training Graph Analytics cuDF cuML cuGRAPH PYTHON DEEP LEARNING FRAMEWORKS RAPIDS DASK/SPARK CUDF CUML CUGRAPH CUDNN CUDA APACHE ARROW on GPU Memory
BENCHMARKS cuIO/cuDF — Load and Data Preparation cuML — XGBoost End-to-End Time in seconds — Shorter is better cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost Benchmark CPU Cluster Configuration DGX Cluster Configuration 200GB CSV dataset; Data preparation CPU nodes (61 GiB of memory, 8 vCPUs, 5x DGX-1 on InfiniBand network includes joins, variable 64-bit platform), Apache Spark transformations.
AI Hub & Pipelines: Fast & simple adoption of AI The Flywheel of AI Adoption 1. Search & Discover 2. Deploy Find best-of-breed solutions on the AI Quick 1-click implementation of ML Hub which leverage Cloud AI solutions pipelines onto Google Cloud Platform . 5. Publish 3. Customize Network Upload & share pipelines running best Experiment and adjustment effect within your org or publicly. out-of-the-box pipelines to custom use cases. 4. Run in production Deploy customized pipelines in production.
02: Reusable Pipelines Enable developers to build custom ML applications by easily “stitching” and connecting various components. • Reuse instead of reimplement or reinvent • Discover, learn and replicate successful pipelines
What constitutes a Kubeflow Pipeline ● Containerized implementations of ML Tasks ○ Containers provide portability, repeatability and encapsulation ○ A task can be single node or *distributed* ○ A containerized task can invoke other services ● Specification of the sequence of steps ○ Specified via Python SDK ● Input Parameters ○ A “Job” = Pipeline invoked w/ specific parameters
03: AI Hub at a glance All AI content in one place 1 Quick discovery of plug & play AI pipelines & other content built by teams across Google and by partners and customers. Fast & simple implementation of AI on GCP 2 One-click deployment of AI pipelines via Kubeflow on GCP as the go-to platform for AI + hybrid & on premise. Enterprise-grade internal & external sharing 3 Foster reuse by sharing deployable AI pipelines & other content privately within organizations & publicly.
Mission The one place for everything AI, from experimentation to production.
Public and private AI Hub + Private content Public content By Google By partners By customers Unique AI assets by Google Created, shared & monetized Content shared securely within and with other organizations by anyone AutoML, TPUs, Cloud AI Platform, etc.
Kubeflow Pipelines enable Workflow Rapid reliable Share, re-use & orchestration experimentation compose
Demo
Visual depiction of pipeline topology
View all current and historical runs, grouped as “Experiments”
Rich visualizations of metrics
Clone an existing pipeline
Access to all config params, inputs and outputs for each run
Update parameters and submit
Easy comparison of Runs
Easy comparison of Runs
That’s a wrap.
Recommend
More recommend