Multi-tenant Machine Learning Apache Aurora & Apache Mesos - PowerPoint PPT Presentation

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb serb@apache.org 2016.11.15 @ErbStephan

Apache Aurora https://aurora.apache.org Mesos framework for the deployment and scaling of stateless and fault tolerant services in a datacenter Apache Mesos https://mesos.apache.org Cluster manager providing fault-tolerant, fj ne-grained multitenancy via containers

Apache Aurora https://aurora.apache.org “distributed supervisord" Apache Mesos https://mesos.apache.org “plumbing”

Cluster Manager

  Aurora Example webservice = Process ( name = 'webservice',   cmdline = ‘./run_my_webservice.py’) task = Task ( processes = [webservice], resources = Resources(cpu=4, ram=4*GB, disk=8*GB)) jobs = [ Job ( task=task,   instances=4, constraints = {'host': 'limit:1'}, service=True,   cluster=‘rz1', role=‘www’, environment=‘prod’, name=‘webserver’), ]

Aurora Example $ aurora update start rz1/www/prod/webserver \ webserver.aurora

● ● Coordinator Node Aurora Scheduler Zookeeper Mesos Master State ● ● Worker Node Mesos Agent ● ● Task (Container) Aurora User Code Executor

Photo by liz west https:// fm ic.kr/p/7qYh21

● ● Customer System • Predictions Data Delivery • Decisions ● ● ● ● Tenant / ML model Tenant / ML model Historic Tenant Data Compute Platform

Key Achievement Data scientists deploy to production.

bigger VM/Host VM/Host

Data larger than RAM Implementation Choices: • semi- external implementation (out-of-core) • communication-e ffi cient distributed memory implementation • streaming (aka “large data volumes are hard, in fj nite data is easy”)

Domain-speci fj c Problem Decomposition # Compute on whole data set # compute_prediction(data) # Compute on partitioned data # # (this is rather restrictive but tends to # work great for many usecases) # for chunk in partition(data): compute_prediction(chunk)

Python Scheduling Master • manages job graphs • guarantees fault tolerance Workers • run python functions • distributable • dynamic worker count http://www.celeryproject.org/ http://distributed.readthedocs.io/en/latest/

Cluster Scheduling Project/ Tenant Compute Cluster

Key Idea Multi-tenancy via multi- instance deployments

Good multi-tenancy is hard enough that it just doesn’t happen by accident. — Jay Kreps https://www.con fm uent.io/blog/sharing-is-caring-multi-tenancy-in-distributed-data-systems

Multi-tenant Features Aurora Mesos • Structured job keys • Linux users • role (tenant01, …) • Filesystem isolation via • environments (devel, …) Docker/Appc containers • name • CPU/RAM isolation via • Job tiers/priorities cgroups • Linux namespaces (pid, • Quota & preemption network, …) • Multi-framework support

Merits and Pitfalls? Multiple frameworks on the same Mesos cluster

Feature Dimensions User Operator • long-running services • high-availability • maintenance primitives • cron jobs & adhoc jobs • resource quotas and • rolling job updates, with automatic rollback preemption • service announcement • instrumented for in ZooKeeper monitoring and • scheduling constraints debugging • oversubscription • Docker/Appc support • self-service UIs

Oversubscription https://github.com/blue-yonder/mesos-threshold-oversubscription

Executive Summary In this talk, we have seen: • Aurora & Mesos provide excellent support for heterogenous workloads. • They can even be used by data scientists to ship machine learning models into production. • All without major headache for your operations team.

Thank you! Stephan Erb serb@apache.org 2016.11.15 @ErbStephan

Multi-tenant Machine Learning Apache Aurora & Apache Mesos - PowerPoint PPT Presentation

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb serb@apache.org 2016.11.15 @ErbStephan Apache Aurora https://aurora.apache.org Mesos

Fault Domains in Mesos Vinod Kone (vinodkone@apache.org) About me Apache Mesos PMC and

Secrets Management in Mesos Vinod Kone ( vinodkone@apache.org ) MesosCon EU 2017 About me

APACHE COTTON MySQL on Mesos Yan Xu xujyan 1 SHORT HISTORY Mesos: cornerstone of

Joe Smith - @Yasumoto Tech Lead, Aurora and Mesos SRE, Twitter Hello everyone, welcome to the last

Aurora Control System y by WaterFurnace Aurora Base Control Board Aurora Base Control Board

Aurora Colour Swatches/Palette Proposal Current vs. New colour swatches Aurora Borealis Colour

Aurora Presentation 3d 2012 Keygen Crack ->->->-> DOWNLOAD 1 / 5 2 / 5 Aurora 3D

Crack Aurora 3d Presentation Crack Crack Aurora 3d Presentation Crack 1 / 3 2 / 3 Aurora 3D

aurora 1.0 HOW IT WORKS aurora 1.1 THE REASON WHY Why aurora app #1 Because it serves as a

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Nvidia GPU Support on Mesos: Bridging Mesos Containerizer and Docker Containerizer MesosCon Asia

MESOS & CONTAINERS Overview of Mesos containerization and upcoming filesystem isolation

Mesos + Singularity: Mesos + Singularity: PaaS automation for mortals PaaS automation for

OpenWhisk on Mesos Tyson Norris/Dragos Dascalita Haut, Adobe Systems, Inc. OPENWHISK ON MESOS

Serenity MESOS OVERSUBSCRIPTION MODULE Szymon Konefa SOFTWARE ENGINEER INTEL CORPORATION

Challenges in Optimizing Job Scheduling on Mesos Alex Gaudio Who Am I? Data Scientist and

Aaron LeMasters & Michael Murphy 1 1 RETRI is a new, agile approach to the Incident

Introduzione al text mining Outline Introduzione e concetti di base Motivazioni,

Lessons Learned in Deploying the World's Largest Scale Lustre File System Presented by David

Virtualizing the Philippine e-Science Grid International Symposium on Grids and Clouds 2011 25

An overview of ab initio scattering, reactions, and operators (circa 2014) Kenneth Nollett

REE Working Session Life Sciences Entrepreneurship: The best ways to integrate life science

Machine Learning Basics Marcello Pelillo University of Venice, Italy Image and Video Understanding

Analysis with the CLAS12TOOL/ROOT Package in Docker CLAS Collaboration Meeting March 2019 Adam

Multi-tenant Machine Learning Apache Aurora & Apache Mesos - PowerPoint PPT Presentation

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb serb@apache.org 2016.11.15 @ErbStephan Apache Aurora https://aurora.apache.org Mesos

Fault Domains in Mesos Vinod Kone (vinodkone@apache.org) About me Apache Mesos PMC and

Secrets Management in Mesos Vinod Kone ( vinodkone@apache.org ) MesosCon EU 2017 About me

APACHE COTTON MySQL on Mesos Yan Xu xujyan 1 SHORT HISTORY Mesos: cornerstone of

Joe Smith - @Yasumoto Tech Lead, Aurora and Mesos SRE, Twitter Hello everyone, welcome to the last

Aurora Control System y by WaterFurnace Aurora Base Control Board Aurora Base Control Board

Aurora Colour Swatches/Palette Proposal Current vs. New colour swatches Aurora Borealis Colour

Aurora Presentation 3d 2012 Keygen Crack -&gt;-&gt;-&gt;-&gt; DOWNLOAD 1 / 5 2 / 5 Aurora 3D

Crack Aurora 3d Presentation Crack Crack Aurora 3d Presentation Crack 1 / 3 2 / 3 Aurora 3D

aurora 1.0 HOW IT WORKS aurora 1.1 THE REASON WHY Why aurora app #1 Because it serves as a

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Nvidia GPU Support on Mesos: Bridging Mesos Containerizer and Docker Containerizer MesosCon Asia

MESOS &amp; CONTAINERS Overview of Mesos containerization and upcoming filesystem isolation

Mesos + Singularity: Mesos + Singularity: PaaS automation for mortals PaaS automation for

OpenWhisk on Mesos Tyson Norris/Dragos Dascalita Haut, Adobe Systems, Inc. OPENWHISK ON MESOS

Serenity MESOS OVERSUBSCRIPTION MODULE Szymon Konefa SOFTWARE ENGINEER INTEL CORPORATION

Challenges in Optimizing Job Scheduling on Mesos Alex Gaudio Who Am I? Data Scientist and

Aaron LeMasters &amp; Michael Murphy 1 1 RETRI is a new, agile approach to the Incident

Introduzione al text mining Outline Introduzione e concetti di base Motivazioni,

Lessons Learned in Deploying the World's Largest Scale Lustre File System Presented by David

Virtualizing the Philippine e-Science Grid International Symposium on Grids and Clouds 2011 25

An overview of ab initio scattering, reactions, and operators (circa 2014) Kenneth Nollett

REE Working Session Life Sciences Entrepreneurship: The best ways to integrate life science

Machine Learning Basics Marcello Pelillo University of Venice, Italy Image and Video Understanding

Analysis with the CLAS12TOOL/ROOT Package in Docker CLAS Collaboration Meeting March 2019 Adam

Aurora Presentation 3d 2012 Keygen Crack ->->->-> DOWNLOAD 1 / 5 2 / 5 Aurora 3D

MESOS & CONTAINERS Overview of Mesos containerization and upcoming filesystem isolation

Aaron LeMasters & Michael Murphy 1 1 RETRI is a new, agile approach to the Incident