David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior - PowerPoint PPT Presentation

David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior Cloud Developer Advocate, Azure

One Year Ago...

What is Machine Learning?

Machine Learning is a way of solving problems without explicitly knowing how to create the solution.

But ML is hard!

Four Years Ago...

Kubernetes 9

Cloud Native Apps

Cloud Native ML?

Platform Building a model

Platform Data Data ingestion Data analysis Data validation Data splitting transformation Building Model Training Trainer a model validation at scale Roll-out Serving Monitoring Logging

Kubecon 2017

Make it Easy for Everyone to Develop, Deploy and Manage Portable, Distributed ML on Kubernetes 15

Experimentation Training Cloud 16

Cloud Native ML!

Momentum! ● ~4000 commits ● ~200 community contributors ● ~50 companies contributing, including:

Community Contributions NOT NOT GOOGLE GOOGLE GOOGLE GOOGLE Kubernetes Kubeflow

Critical User Journey Comparison 2017 2019 • Experiment with Jupyter • Setup locally with miniKF • Distribute your training with TFJob • Access your cluster with Istio/Ingress • Serve your model with TF Serving • Ingest your data with Pachyderm • Transform your data with TF.T • Analyze the data with TF.DV • Experiment with Jupyter • Hyperparam sweep with Katib • Distribute your training with TFJob • Analyze your model with TF.MA • Serve your model with Seldon • Orchestrate everything with KF.Pipelines

Community Contribution Katib from NTT • Pluggable microservice architecture for HP tuning Different optimization algorithms • Different frameworks • • StudyJob (K8s CRD) Hides complexity from user • No code needed to do HP tuning • 21

Community Contribution Argo from Intuit • Argo CRD for workflows • Argo CRD is engine for Pipelines • Argo CD for GitOps 22

Community Contribution NB & Storage from Arrikto • Core Notebook Experience • 0.4: New JupyterHub-based UI • 0.5: K8s-Native Notebooks UI • Pipelines: Support for local storage • Multiple Persistent Volumes • MiniKF: All-in-one packaging for seamless local deployments 23

Community Contribution TensorRT from NVidia • Production datacenter inferencing server • Maximize real-time inference performance of GPUs • Multiple models per GPU per node • Supports heterogeneous GPUs & multi GPU nodes • Integrates with orchestration systems and auto scalers via latency and health metrics 24

Introducing Kubeflow 0.5 25

What’s in the box? UX investments - First class notebooks & central dashboard Build/Train/Deploy From notebook • Better multi-user support • A new web-based spawer • Enterprise readiness Better namespace support • API stability • Upgradability with preservation of historical metadata • Advanced composability & tooling Advanced support for calling out to web services • Ability to specify GPU/TPUs for pipeline steps • New metadata backend •

Better/Faster/Production Notebooks! User Goal = Just give me a notebook! Problem • Setting up A notebook is O(easy) • Setting up a rich, production-ready notebook is O(hard) • Setting up a rich, production-ready notebook that works anywhere, on any cloud, with a minimum of changes is O(very very hard)

Better/Faster/Production Notebooks! Setting up a notebook is easy! Except… • Custom libraries $ curl -O https://repo.continuum.io/archive/Anaconda3-5.0.1- • HW provisioning (especially GPUs) & drivers Linux-x86_64.sh • Portability (between laptop and clouds) $ bash -c Anaconda3-5.0.1-Linux-x86_64.sh $ conda create -y -n mlenv python=2 pip scipy • Security profiles gevent sympy • Service accounts $ source activate mlenv • Credentials $ pip install tensorflow==1.13.0 | tensorflow- gpu==1.7.0 • Lots more… $ open http://127.0.0.1:8080

Better/Faster/Production Notebooks! Solution – Declarative Data Science Environments with Kubeflow!

Better/Faster/Production Notebooks! Setting up a declarative Add your custom components! environment is easy! # Add Seldon Server $ ks pkg install kubeflow/seldon $ kfctl.sh init $ kfctl.sh --platform aks \ --project my-project # Add XGBoost $ kfctl.sh generate platform $ ks pkg install kubeflow/xgboost $ kfctl.sh apply platform $ kfctl.sh generate k8s # Add hyperparameter tuning $ kfctl.sh apply k8s $ ks pkg install kubeflow/katib # Add Seldon Server $ ks pkg install kubeflow/seldon

Experimentation Training Cloud I Got You! IT Ops 31

DEMO 32

Rich Container Based Pipelines User Goal = Repeatable, multi-stage ML training Problem • Tools not built to be containerized/orchestrated • Coordinating between steps often requires writing custom code • Different tools have different infra requirements

Rich Container Based Pipelines Ingestion Training Serving TF.Transform TF.Job TF.Serving ??? Pipelines should: • Be cloud native (microservice oriented, loosely coupled) and ML aware • Support both data and task driven workflows • Understand non-Kubeflow-based services (e.g. external to the cluster)

Rich Container Based Pipelines Solution – Kubeflow Pipelines!

Kubeflow Pipeline Details • Containerized Implementations of ML Tasks • Escapsulates all the dependencies of a step with no conflicts • Step can be singular or distributed • Can also involve external services • Specified via Python SDK • Inputs/outputs/parameters can be chained together

Rich Container Based Pipelines Ingestion Training Serving TF.Transform TF.Job TF.Serving ingestStep = dsl.ContainerOp(image=tft_image, <params>, file_outputs ={'bucket': '/output.txt’}) trainStep = dsl.ContainerOp(image=tfjob_image, <params>, arguments=[ingestStep.outputs ['bucket’]]) servingStep = dsl.ContainerOp(image=tfs_image, <params>, arguments=[convertStep.outputs['bucket']])

Can I Change a Step? 39

NVIDIA TENSORRT INFERENCE SERVER Production Data Center Inference Server Maximize inference throughput & GPU utilization Inference TensorRT Tesla T4 Server Quickly deploy and manage multiple Tesla T4 models per GPU per node Tesla Inference TensorRT Server V100 Easily scale to heterogeneous GPUs Tesla and multi GPU nodes V100 Integrates with orchestration TensorRT Inference Tesla P4 Server systems and auto scalers via latency and health metrics Tesla P4 Now open source for thorough customization and integration 42

FEATURES Concurrent Model Execution Dynamic Batching Multiple models (or multiple instances of same Inference requests can be batched up by the model) may execute on GPU simultaneously inference server to 1) the model-allowed maximum or 2) the user-defined latency SLA Eager Model Loading Multiple Model Format Support Any mix of models specified at server start. All models loaded into memory. TensorFlow GraphDef/SavedModel TensorFlow and TensorRT GraphDef TensorRT Plans CPU Model Inference Execution Caffe2 NetDef (ONNX import path) Framework native models can execute inference requests on the CPU Mounted Model Repository Models must be stored on a locally accessible Metrics mount point Utilization, count, and latency Custom Backend Custom backend allows the user more flexibility by providing their own implementation of an execution engine through the use of a shared library 43

Rich Container Based Pipelines Ingestion Training Serving TF.Transform TF.Job TF.Serving ingestStep = dsl.ContainerOp(image=tft_image, <params>, file_outputs ={'bucket': '/output.txt’}) trainStep = dsl.ContainerOp(image=tfjob_image, <params>, arguments=[ingestStep.outputs ['bucket’]]) servingStep = dsl.ContainerOp(image=trt_image, <params>, arguments=[convertStep.outputs['bucket']])

Rich Container Based Pipelines Ingestion Training Serving TF.Transform TF.Job TensorFlow RT ingestStep = dsl.ContainerOp(image=tft_image, <params>, file_outputs ={'bucket': '/output.txt’}) trainStep = dsl.ContainerOp(image=tfjob_image, <params>, arguments=[ingestStep.outputs ['bucket’]]) servingStep = dsl.ContainerOp(image=trt_image, <params>, arguments=[convertStep.outputs['bucket']])

Now, Add a Step 47

David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior - PowerPoint PPT Presentation

David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior Cloud Developer Advocate, Azure One Year Ago... What is Machine Learning? Machine Learning is a way of solving problems without explicitly knowing how to create the

Using PubSub For Scheduling in Azure SDN Qi Zhang (Microsoft - Azure Networking) Azure

2014/07/10 1 ZDA One Stop Shop Department Topics OSS Background OSS Our Services OSS

Azure Active Directory Provider The Azure Provider can be used to congure infrastructure in

Lead Azure Architect, MCT(Microsoft Certified Trainer) Azure Talk by Niraj kumar, Cloud Architect!

Niraj Kumar Lead Azure Architect, MCT( Microsoft Certified Trainer) Azure Talk by Niraj kumar,

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Using Google Trends Data Seth Stephens-Davidowitz October 1, 2014 Seth Stephens-Davidowitz

Azure MapReduce Thilina Gunarathne Salsa group, Indiana University Agenda Recap of Azure

LibreOffice oss-fuzz, crashtesting, coverity Overview Oss-Fuzz Crashtesting Coverity

PNDA.io: when big data and OSS collide [Build Slide] Simplified OSS / BSS Stack Bills and

Your Financial Audit: Stress or Stress Free You Decide Dr. Tony Juarez, EdD, CPA, CIA Pattillo,

Prior-free cost sharing design Ruben Juarez Department of Economics, University of Hawaii

Genetic Algorithms Seth Bacon 4/25/2005 Seth Bacon 1 What are Genetic Algorithms Search

Microsoft Azure Security Protecting mission-critical cloud Steve Faehl Microsoft US National

Microsoft AZURE Giovanni Gatto Azure Partner Recruiter EMAIL: ggatto@Microsoft.com TWITTER:

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your

Continuous Integration using Docker & Jenkins LinuxCon Europe 2014 October 13-15, 2014

PTask: Operating System Abstractions To Manage GPUs as Compute Devices C.J. Rossbach, J. Currey

Meteor Fullstack JavaScript Development Retro42: Our prototype application Why did we choose

TRADELENS OVERVIEW 2-OCT-18 2 TradeLens Overview Introduction 2-Oct-18 IBM AND MAERSK HAVE

North American Palladium The Only Pure Play Palladium Producer Second Quarter 2018 Results TSX:

National Implementation Action Plans National Implementation Action Plans WORKSHOP ON THE

Support to Developing Countries Program institutional arrangements UNFCCC process (CoP, AC)

National Climate Change Adaptation Planning JAMAICA MINISTRY OF ECONOMIC GROWTH & JOB

David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior - PowerPoint PPT Presentation

David Aronchick Head of OSS ML Strategy, Azure Seth Juarez Senior Cloud Developer Advocate, Azure One Year Ago... What is Machine Learning? Machine Learning is a way of solving problems without explicitly knowing how to create the

Using PubSub For Scheduling in Azure SDN Qi Zhang (Microsoft - Azure Networking) Azure

2014/07/10 1 ZDA One Stop Shop Department Topics OSS Background OSS Our Services OSS

Azure Active Directory Provider The Azure Provider can be used to congure infrastructure in

Lead Azure Architect, MCT(Microsoft Certified Trainer) Azure Talk by Niraj kumar, Cloud Architect!

Niraj Kumar Lead Azure Architect, MCT( Microsoft Certified Trainer) Azure Talk by Niraj kumar,

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

Using Google Trends Data Seth Stephens-Davidowitz October 1, 2014 Seth Stephens-Davidowitz

Azure MapReduce Thilina Gunarathne Salsa group, Indiana University Agenda Recap of Azure

LibreOffice oss-fuzz, crashtesting, coverity Overview Oss-Fuzz Crashtesting Coverity

PNDA.io: when big data and OSS collide [Build Slide] Simplified OSS / BSS Stack Bills and

Your Financial Audit: Stress or Stress Free You Decide Dr. Tony Juarez, EdD, CPA, CIA Pattillo,

Prior-free cost sharing design Ruben Juarez Department of Economics, University of Hawaii

Genetic Algorithms Seth Bacon 4/25/2005 Seth Bacon 1 What are Genetic Algorithms Search

Microsoft Azure Security Protecting mission-critical cloud Steve Faehl Microsoft US National

Microsoft AZURE Giovanni Gatto Azure Partner Recruiter EMAIL: ggatto@Microsoft.com TWITTER:

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your

Continuous Integration using Docker &amp; Jenkins LinuxCon Europe 2014 October 13-15, 2014

PTask: Operating System Abstractions To Manage GPUs as Compute Devices C.J. Rossbach, J. Currey

Meteor Fullstack JavaScript Development Retro42: Our prototype application Why did we choose

TRADELENS OVERVIEW 2-OCT-18 2 TradeLens Overview Introduction 2-Oct-18 IBM AND MAERSK HAVE

North American Palladium The Only Pure Play Palladium Producer Second Quarter 2018 Results TSX:

National Implementation Action Plans National Implementation Action Plans WORKSHOP ON THE

Support to Developing Countries Program institutional arrangements UNFCCC process (CoP, AC)

National Climate Change Adaptation Planning JAMAICA MINISTRY OF ECONOMIC GROWTH &amp; JOB

Continuous Integration using Docker & Jenkins LinuxCon Europe 2014 October 13-15, 2014

National Climate Change Adaptation Planning JAMAICA MINISTRY OF ECONOMIC GROWTH & JOB