High Performance Computing with do doAzur ureP ePar arallel - PowerPoint PPT Presentation

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan

Azure Big Compute

Azure Infrastructure Commodity, most Fast processors, Fast processors, Most memory, Intel HPC/Low Latency GPU enabled value for cost higher memory-to- lower-memory to Xeon processors VMs for compute VMs for core ratio, SSDs core ratio, SSDs intensive workloads Visualization/ Compute

What is Batch? Many individual tasks APP Many computers/VMs Tasks are assigned to computers/VMs

Scenarios • A quant back-testing portfolio strategies • A data scientist optimizing their model & parameter tuning • A life-science researcher doing genome sequencing

What do they have in common? • Scale – computationally expensive work - need to scale up in order to get results back quickly • Minimal IT Management – the user is the domain specialist, not an IT specialist • Elastic compute – temporary need for a lot of capacity • Cost effective – low cost strategies are important! + They are all probably using R…

doAzureParallel is... A R package that uses Azure as a parallel-backend for popular open source tools to use – foreach, caret, dplyr, etc.

Foreach using doAzureParallel foreach (i = 1:100) %dopar% { myParallelAlgorithm(...) } Microsoft Azure

doAzureParallel on Azure Batch Azure Batch is a platform service that provides easy job scheduling and cluster management, allowing applications or algorithms to run in parallel at scale. • Capacity on demand; jobs on demand • Autoscale (more on this later) • Minimal cluster management (node failure, install, etc) • Hardware choice – use any VM size • Pay by the minute • Cost effective – no charge for using it, you only pay for the VMs • More cost effective – low priority VMs (more on this later) If you want to run jobs using elastic compute, Batch is a great fit!

Scale • From 1 to 10,000 VMs for a cluster • From 1 to millions of tasks • Your selection of hardware: • General compute VMs (A-Series / D-Series) • Memory / storage optimized (G-Series) • Compute Optimized (F-Series) • GPU enabled (N-Series) • Results from computing the mandelbrot set when scaling up: 10 parallel 20 parallel Local 5 parallel workers workers machine workers

Minimal Cluster Management • Abstract away complex Azure/cloud concepts • Zero IT-level management • Work entirely in R Studio • Monitor / Debug your jobs directly in R studio • Manage your cluster and multiple jobs directly in R studio • The results of your distributed, large scale work can be returned directly to your R session

Minimal code change • Minimal code change to use doAzureParallel • Easy to use and you can get started in just a few lines of code

Elastic Compute • Compute on-demand • Create/delete your cluster as you need • Autoscaling pool = maximizing cloud elasticity • Long running batch jobs / overnight • Daily scheduled work – pre-provision cluster so its ready for you at the beginning of the day • Bursty work

Cost Effective • Low-Priority = (extremely) Low Costs • Provisioning VMs from Azure’s surplus capacity at 80% discount • Your Azure cluster can contain both regular (dedicated) VMs and low-priority VMs Azure Batch Low Priority VMs Dedicated VMs at up to 80% discount My Local R Session

Cost Effective: More about Low Priority When should I use it? • Long running work that can be broken into smaller pieces and work that doesn't have a strict time limit to complete • Experimentation, testing, evaluating models What you need to know when using it: • Possibility that Azure • will not allocate your VMs OR • that it will take some or all of the capacity back • If a node is pre-empted • Azure Batch will replace your node for you • Azure Batch will reschedule your work so that you job can successfully complete

Low Priority Scenarios Preempted Dedicated Low-priority Lower cost Lower cost + + maintaining capacity w/ autoscale Lowest Cost guaranteed baseline capacity Azure Batch Pool Azure Batch Pool Azure Batch Pool Capacity Capacity Capacity Time Time Time

Questions? www.github.com/azure/doazureparallel https://aka.ms/earl2017

What’s new with doAzureParallel? • Low priority support a • Richer Job Management experience a • Resource Files to preload data a • Parameter Tuning integration with Caret a • Simple connector to Azure Blob Storage a

R + Azure Batch So what R workloads work great on Azure Batch? • Simulation based work (VaR calculation, back-testing, monte-carlo simulations, financial modelling) • Parameter Tuning / Model Evaluation (grid search, random search, cross validation, etc) • Computing against data / ETL jobs / Data-prep jobs What industries / verticals might be interested in using this? • Financial Services • Education & Research • Sports analytics

doAzureParallel (since initial release) • Initial release in March • Grass roots strategy • End-user focused • Financial Services targeted / key messaging has been around simulation based work • Interest from the field • Feedback

Azure Batch Low Priority VMs Dedicated VMs at up to 80% discount My Local R Session

High Performance Computing with do doAzur ureP ePar arallel - PowerPoint PPT Presentation

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan Azure Big Compute Azure Infrastructure Commodity, most Fast processors, Fast

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

High Performance Computing at High Performance Computing at the University of Utah: A User the

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC

An Overview of High An Overview of High Performance Computing and Performance Computing and

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn,

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

An Overview of High Performance An Overview of High Performance Computing, Clusters, and the Grid

EUREKA CLUSTERS AI CALL Simon Haafs Bob van der Bijl Funding outlook Budget: 5 mln.

S9334: Building And Managing Scalable AI Infrastructure With NVIDIA DGX Pod And DGX Pod

HeRAMS Health Resources Availability Mapping System A quick presentation Global Health Cluster /

Agro ICT Cluster Company Presentation INTRODUCTION OUR MEMBERS Kern Hungarian National NAK

EFFECTIVE DATA PRESENTATION TECHNIQUES Prof. Phil Murphy and Prof. Moyara Ruehsen Tuftes

Cluster Presentation Cluster Presentation EU-EECA ICT Cluster is the joint effort of three

iGENI Presentation for ORCA Cluster Meeting, GEC 10 Presented By iGENI Consortium:

Aerotropolis Atlanta CIDs Presentation Freight Cluster Plan Pre-Bid Meeting Stan Reecy, Project

High Performance Computing with do doAzur ureP ePar arallel - PowerPoint PPT Presentation

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan Azure Big Compute Azure Infrastructure Commodity, most Fast processors, Fast

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

High Performance Computing at High Performance Computing at the University of Utah: A User the

High-performance computing in Java: the data processing of Gaia X. Luri &amp; J. Torra ICCUB/IEEC

An Overview of High An Overview of High Performance Computing and Performance Computing and

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn,

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

An Overview of High Performance An Overview of High Performance Computing, Clusters, and the Grid

EUREKA CLUSTERS AI CALL Simon Haafs Bob van der Bijl Funding outlook Budget: 5 mln.

S9334: Building And Managing Scalable AI Infrastructure With NVIDIA DGX Pod And DGX Pod

HeRAMS Health Resources Availability Mapping System A quick presentation Global Health Cluster /

Agro ICT Cluster Company Presentation INTRODUCTION OUR MEMBERS Kern Hungarian National NAK

EFFECTIVE DATA PRESENTATION TECHNIQUES Prof. Phil Murphy and Prof. Moyara Ruehsen Tuftes

Cluster Presentation Cluster Presentation EU-EECA ICT Cluster is the joint effort of three

iGENI Presentation for ORCA Cluster Meeting, GEC 10 Presented By iGENI Consortium:

Aerotropolis Atlanta CIDs Presentation Freight Cluster Plan Pre-Bid Meeting Stan Reecy, Project

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC