From inception to insight: Accelerating AI productivity with GPUs John Zedlewski, Director, RAPIDS Machine Learning @ NVIDIA Ramesh Radhakrishnan, Technologist @ Server OCTO, Dell EMC
Data sizes continue to grow Prototyping and production diverge Challenges ! Large-scale cluster RAPIDS on GPU Spark, Hadoop “Tools gap” - rewriting Python or RAPIDS + Dask High throughput R code to Spark/Hadoop jobs to Full data Consistent tools: scale to cluster Workstation or Cluster High latency on cluster leads to High throughput / Workstation slower iteration low latency Python Small data subsets on workstation Full data or large subsets Fast iteration make it hard to build realistic Small data subset models 2
Data Processing Evolution Faster data access, less data movement Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Read Primarily In-Memory Traditional GPU Processing 5-10x Improvement More code Language rigid HDFS ReadQuery CPU GPU GPU CPU GPU ML Read ETL Substantially on GPU Read Write Write Read Train 3
Data Movement and Transformation Data Movement and Transformation The bane of productivity and performance APP B Read Data APP B GPU APP B Data Copy & Convert CPU GPU Copy & Convert Copy & Convert GPU APP A Data APP A Load Data APP A 4
Data Movement and Transformation Data Movement and Transformation What if we could keep data on the GPU? APP B Read Data APP B GPU APP B Data Copy & Convert CPU GPU Copy & Convert Copy & Convert Copy & Convert GPU APP A Data APP A Load Data APP A 5
Data Processing Evolution Faster data access, less data movement Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Read Primarily In-Memory Traditional GPU Processing 5-10x Improvement More code Language rigid HDFS ReadQuery CPU GPU GPU CPU GPU ML Read ETL Substantially on GPU Read Write Write Read Train RAPIDS 50-100x Improvement Same code Language flexible ML Arrow Query ETL Primarily on GPU Train Read 6
RAPIDS Scale up and out with accelerated GPU data science Data Preparation Model Training Visualization Dask cuDF cuIO cuML cuGraph PyTorch Chainer MxNet cuXfilter <> pyViz Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory 7
RAPIDS Scale up and out with accelerated GPU data science Data Preparation Model Training Visualization Dask cuDF cuIO cuML cuGraph PyTorch Chainer MxNet cuXfilter <> pyViz Analytics Machine Learning Graph Analytics Deep Learning Visualization Pandas sklearn NetworkX API API API GPU Memory 8
Scale up with RAPIDS RAPIDS and Others Accelerated on single GPU Scale Up / Accelerate NumPy -> CuPy/PyTorch/.. Pandas -> cuDF Scikit-Learn -> cuML Numba -> Numba PyData NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data 9
Scale out with RAPIDS + Dask with OpenUCX RAPIDS and Others RAPIDS + Dask with OpenUCX Accelerated on single GPU Scale Up / Accelerate Multi-GPU NumPy -> CuPy/PyTorch/.. On single Node (DGX) Pandas -> cuDF Or across a cluster Scikit-Learn -> cuML Numba -> Numba PyData Dask NumPy, Pandas, Scikit-Learn, Multi-core and Distributed PyData Numba and many more NumPy -> Dask Array Single CPU core Pandas -> Dask DataFrame In-memory data Scikit-Learn -> Dask-ML … -> Dask Futures Scale out / Parallelize 10
Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data Preparation cuML - XGBoost End-to-End 8762 6148 3925 3221 322 213 Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost Benchmark CPU Cluster Configuration DGX Cluster Configuration 200GB CSV dataset; Data prep includes CPU nodes (61 GiB memory, 8 vCPUs, 64- 5x DGX-1 on InfiniBand joins, variable transformations bit platform), Apache Spark network 11
Dask + cuML Machine Learning at Scale 12
Machine Learning More models more problems Data Preparation Model Training Visualization Dask cuDF cuIO cuML cuGraph PyTorch Chainer MxNet cuXfilter <> pyViz Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory 13
Algorithms GPU-accelerated Scikit-Learn Decision Trees / Random Forests Linear Regression Classification / Regression Logistic Regression K-Nearest Neighbors Inference Random forest / GBDT inference K-Means Clustering DBSCAN Spectral Clustering Principal Components Singular Value Decomposition Decomposition & Dimensionality Reduction UMAP Spectral Embedding Cross Validation Holt-Winters Time Series Kalman Filtering Hyper-parameter Tuning More to come! 14
RAPIDS matches common Python APIs GPU-Accelerated Clustering from sklearn.datasets import make_moons import cudf X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = cudf .DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) Find Clusters from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) dbscan.fit(X) y_hat = dbscan.predict(X) 15
Benchmarks: single-GPU cuML vs scikit-learn 1x V100 vs 2x 20 core CPU 16
Why Dask? • PyData Native Built on top of NumPy, Pandas Scikit-Learn, etc. (easy to migrate) • With the same APIs (easy to train) • With the same developer community (well trusted) • Scales • • Easy to install and use on a laptop Scales out to thousand-node clusters • Popular • • Most common parallelism framework today at PyData and SciPy conferences • Deployable HPC: SLURM, PBS, LSF, SGE • Cloud: Kubernetes • • Hadoop/Spark: Yarn 17
Using Dask in Practice Familiar Python APIs import pandas as pd import dask_cudf df = pd.read_csv(“data - *.csv”) df = dask_cudf .read_csv(“data - *.csv”) df.groupby(df.user_id).value.mean() df.groupby(df.user_id).value.mean(). compute () Dask supports a variety of data structures and backends 18
A quick demo... 19
RAPIDS How do I get the software? https://github.com/rapidsai https://ngc.nvidia.com/registry/nvidia- • • rapidsai-rapidsai https://anaconda.org/rapidsai/ • • https://hub.docker.com/r/rapidsai/rapidsai/ 20
AI is more than Model Training Deploy the solution at scale Modeling & performance evaluation Identify and define business problem Promote user adoption Models development Capture business requirements Continuous model improvement M ODELING F INDINGS O PERATIONALIZE D ISCOVERY E XPLORATION Exploratory data analysis Deliver Insights & Recommendations Acquire, prepare & enrich data Measure business effectiveness & ROI Preliminary ROI analysis Promote business enablement Source: Adapted from DellTech Data science solutions learnings & presentation 21
A vision for end-to-end ML journey – Fluid and Flexible Dell Tech AI offerings Cloud to In-house, One vendor with no Lock-in Direct & Consultative Sales System Integrator partners Dell Tech consulting services SOFTWARE Enterprise ISV Accelerator Virtualization Open Source Deep PARTNERS ML Software and pooling Learning Frameworks ECO-SYSTEM Hybrid Cloud Public Cloud Private Cloud C L O U D F O U N D A T I O N Infrastructure Hosted Hybrid Public cloud Ready Bundles Cloud & distributed Private for ML and DL cloud Appliances edge B O O M I C O N N E C T O R S P A R T N E R S O F T W A R E D AT A P R E P A R AT I O N Dell EMC Public cloud storage Elastic cloud storage Enterprise Storage Restricted - Confidential 22
AI and Deep Learning Workstation Use Cases Data Science Sandboxes Production Development NVIDIA GPU CLOUD (NGC) Container Framework ML Rapids pyTorch GPU GPU Precision Precision 1-3 x 1-3 x WS WS Copy and In place stage access Isilon Storage 23
Pre-verified GPU-accelerated Deep Learning Platforms White papers, best practices, performance tests, dimensioning guidelines NVIDIA GPU CLOUD (NGC) ML Rapids pyTorch Hyper-Converged Precision WS Dell PowerEdge Dell DSS8440 DGX-1 Server DGX-2 Server ECS Object Store Isilon F, H and A nodes Ready Solutions as appropriate Currently pre-verified Isilon the foundation data lake for AI platforms To be verified in H2 2019 24
THANK YOU Ramesh Radhakrishnan TODO insert email / Twitter John Zedlewski @zstats jzedlewski@nvidia.com
Join the Movement Everyone can help! GPU Open Analytics APACHE ARROW RAPIDS Dask Initiative https://dask.org https://arrow.apache.org/ https://rapids.ai http://gpuopenanalytics.com/ @ApacheArrow @RAPIDSAI @Dask_dev @GPUOAI Integrations, feedback, documentation support, pull requests, new issues, or code donations welcomed! 26
cuDF 27
RAPIDS GPU Accelerated data wrangling and feature engineering Data Preparation Model Training Visualization Dask cuDF cuIO cuML cuGraph PyTorch Chainer MxNet cuXfilter <> pyViz Analytics Machine Learning Graph Analytics Deep Learning Visualization GPU Memory 28
GPU-Accelerated ETL The average data scientist spends 90+% of their time in ETL as opposed to training models 29
Recommend
More recommend