IBM’s Open-Source Based AI Developer Tools Sumit Gupta VP , AI, Machine Learning & HPC IBM Cognitive Systems @SumitGup guptasum@us.ibm.com March 2019
AI Software Portfolio Strategy Deliver a comprehensive platform that enables data science at all skill levels Build AI Models Train AI Models: Deploy & AI Model Prepare Data (Machine / Deep Interactive or Manage Model Performance Learning) Batch Lifecycle Monitoring With GPU Inference on acceleration CPU, GPU, FPGA Scale to Enterprise-wide Deployment Multiple data scientists • Shared Cluster / Hardware Infrastructure • Hybrid Cloud: Common experience on-premise and in public cloud 2
IBM Open Source Based AI Stack Auto-AI software: PowerAI Vision, IBM Auto-AI Watson Watson Watson OpenScale Machine Learning Studio WML CE Watson ML Accelerator Watson ML CE Data Preparation Model Metrics, Runtime Environment Model Development Bias, and Fairness Train, Deploy, Manage Models Environment Monitoring SnapML Storage Accelerated AC922 (Spectrum Scale ESS) Power9 Servers Previous Names: Runs on x86 & other storage too WML Accelerator = PowerAI Enterprise WML Community Ed. = PowerAI-base Available on Private Cloud or Public Cloud 3
Our Focus: Ease of Use & Faster Model Training Times Distributed Deep Learning Auto Hyper-Parameter (DDL) Optimization (HPO) Watson ML Elastic Distributed Training (EDT) & Elastic Distributed Inference (EDI) Accelerator IBM Spectrum Conductor Apache Spark, Cluster Virtualization, Job Orchestration Model Management Model Life Cycle Watson ML & Execution Management WML CE: Open Source ML Frameworks Watson ML Snap ML Community Edition WML CE Large Model Support (LMS) DDL-16 Infrastructure Designed for AI Power9 or x86 Servers Storage with GPU Accelerators (ESS) 4
Snap ML Distributed High Performance Machine Learning Library Snap Machine Learning (ML) Library APIs for Popular ML GPU Accelerated Multi-Core CPU Frameworks Logistic Regression Decision Trees Linear Regression Random Forests Ridge / Lasso Regression More coming …. New Support Vector Machines Multi-Core, Multi-Socket CPU-GPU Memory & GPU Acceleration Management Distributed Training: Multi-CPU & Multi-GPU 5
Most Popular Why do we need Data Science Methods High-Performance ML • Performance Matters for • Online Re-training of Models • Model Selection & Hyper- Parameter Tuning Supported by • Fast Adaptability to Changes Snap ML Deep Learning • Scalability to Large Datasets useful for Support in 2H 19 • Recommendation engines, Advertising, Credit Fraud • Space Exploration, Weather Source: Kaggle Data Science Survey 2017 6
Logistic Regression: 46x Faster Ridge Regression: 3x Faster Snap ML (GPU) vs TensorFlow (CPU) Power-NVLink-GPUs vs x86-PCIe-GPUs 90 x86 Servers 4 Power9 Servers (CPU-only) With GPUs 80 120 1.1 Hours 106.71 102.10 46x Faster 94.94 100 86.87 86.22 60 Runtime (Minutes) x86 Server with 1 GPU 80 RunTime (s) 40 60 40 30.75 30.64 30.66 30.63 29.83 20 Power Server with 1 GPU 1.53 20 Minutes 0 0 1 2 3 4 5 Google TensorFlow Snap ML Runs Predict volatility of stock price, 10-K textual financial reports, Advertising Click-through rate prediction Criteo dataset: 4 billion examples, 1 million features 482,610 examples x 4.27M features 7
Snap ML is 2-4x Faster than scikit-learn – CPU-only Decision Trees Random Forests 3.8x faster 4.2x faster Snap ML on Power vs sklearn on x86 Snap ML on Power vs sklearn on x86 1200 120 2.0x 3.8x 1000 100 Runtime (sec) Runtime (sec) 800 80 3.0x 600 60 2.4x 400 40 2.0x 3.8x 200 20 3.0x 4.2x 0 0 creditcard susy higgs epsilon creditcard susy higgs epsilon Datasets Datasets SnapML (P9) sklearn (x86) SnapML (P9) sklearn (x86)
Summary of Performance Results for Snap ML GPU vs CPU Snap ML vs scikit-learn: Linear Models 20-40x Power vs x86 with GPUs Snap ML: Linear Models 3x CPU Only: Power vs x86 Snap ML vs scikit-learn: Tree Models 2-4x 9
Store Large Models & Dataset in Large Model Support (LMS) Enables System Memory Higher Accuracy via Larger Models Transfer One Layer at a Time to GPU TensorFlow with LMS 35 Memory Memory 4.7x Faster 30 170GB/s 170GB/s 25 Images / sec CPU CPU 20 NVLink NVLink 150 GB/s 15 150 GB/s 10 GPU GPU GPU GPU 5 IBM AC922 Power9 Server 0 CPU-GPU NVLink 5x Faster Power + 4 GPUs x86 + 4 GPUs than Intel x86 PCI-Gen3 500 Iterations of Enlarged GoogleNet model on Enlarged ImageNet Dataset (2240x2240), mini-batch size = 15 Both servers with 4 NVIDIA V100 GPUs 10
Distributed Deep Learning (DDL) Near Ideal (95%) Scaling to 256 GPUs Deep learning training takes 256 days to weeks Ideal Scaling 128 Runs within DDL Actual Scaling Hours 64 32 Speedup DDL in WML CE extends TensorFlow 16 & enables scaling to 100s of servers 8 4 Runs for Days 2 Automatically distribute and train on 1 4 16 64 256 large datasets to 100s of GPUs Number of GPUs ResNet-50, ImageNet-1K Caffe with PowerAI DDL, Running on S822LC Power System 11
Auto Hyper-Parameter Optimization (HPO) in WML Accelerator Run Model Training 100s of Times Lots of Hyperparameters: Change Manual Process Train Model Hyperparameters Learning rate, Decay rate, Batch Manually Choose size, Optimizers (Gradient Parameters Descent, Momentum, ..) Training Job 1 Monitor & Prune Auto-HPO has 3 search WML Accelerator Training Job 2 Select Best approaches Auto-Hyperparameter Hyperparameters Optimizer (Auto-HPO) Training Job n Random, Tree-based Parzen Estimator (TPE), Bayesian IBM Spectrum Conductor running Spark
T0 : Job 1 Starts, uses all available GPUs Elastic Distributed Training (EDT) T1 : Job 2 Starts, Job 1 gives up 4 GPUs T2: Job 2 gets higher priority, Job 1 gives up GPUs GPU Slots Dynamically Reallocates GPUs within T3: Job 1 finishes, 8 milliseconds Job 2 uses all GPUs 6 Increases Job Throughput and Server / GPU Utilization 4 Job 1 2 Works with Spark & AI Jobs 0 Works with Hybrid x86 & Power Cluster 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 Time 8 6 4 Job 2 2 2 Servers with 4 GPUs each: total 8 GPUs Available Policies: Fair share, Preemption, Priority 0 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 13 Time
PowerAI Vision: “Point-and-Click” AI for Images & Video Label Image or Auto-Train AI Package & Deploy Video Data Model AI Model 14
Core use cases Image Classification Object Detection Image Segmentation 15
Automatic Labeling using PowerAI Vision Manually Label Some Auto-Label Full Dataset Manually Correct Train DL Model Image / Video Frames with Trained DL Model Labels on Some Data Repeat Till Labels Achieve Desired Accuracy 16
Remote Inspection & Retail Analytics Worker Safety Compliance Asset Management Track how customers Zone monitoring, heat Identify faulty or worn- navigate store, identify maps, detection of out equipment in fraudulent actions, loitering, ensure worker remote & hard to reach detect low inventory safety compliance locations
Quality Inspection Use Cases Semiconductor Electronics Travel & Oil & Gas Manufacturing Manufacturing Transportation Utilities Robotic Steel Aerospace & Inspection Manufacturing Manufacturing Defense 18
AI Developer Box & AI Starter Kit Power AI DevBox AI Starter Kit Free 30-Day Licenses for WML Accelerator Pre-installed PowerAI Vision & WML Accelerator (formerly called PowerAI Enterprise) (free to Academia) 2 AC922 Accelerated Servers Power9 + GPU Desktop PC: $3,449 + 1 P9 Linux Storage Server Order from: https://raptorcs.com/POWERAI/ 19
500+ Clients using AI on Power Systems Power AI Clients at THINK 2019 20
IBM AI Meetups Community Grew 10x in 9 Months From 6K to 85K Members in 9 Months Members Groups 100000 80 70 80000 60 50 60000 40 40000 30 20 20000 10 0 0 8 8 8 8 8 8 8 9 9 1 1 1 1 1 1 1 1 1 - - - - - - - - - n l g p t v c n b u c u e u e o a e J O A N D J J S F Members Groups https://www.meetup.com/topics/powerai/ 21
Summary Watson ML: Machine / Deep Learning Toolkit Snap ML: Fast Machine Learning Framework Power AI DevBox & AI Starter Kit 22
Build a Data Science Team Your Developers Can Learn http://cognitiveclass.ai Identify a Low Hanging Use Case Get Started Today with Machine & Deep Learning Figure Out Data Strategy Consider Pre-Built AI APIs Hire Consulting Services Get Started Today at www.ibm.biz/poweraideveloper 23
Additional Details 24
Why are Linear & Tree Models Useful? GLMs can scale to datasets with billions of examples and/or features & Fast Training still train in minutes to hours Machine learning models can train to “good-enough” accuracy with Need Less Data much less data than deep learning requires Linear models explicitly assign an importance to each input feature Interpretability Tree models explicitly illustrate the path to a decision. Linear models involve much fewer parameters than more complex Less Tuning models (GBMs, Neural Nets) 25
Recommend
More recommend