high performance computing with
play

High Performance Computing with do doAzur ureP ePar arallel - PowerPoint PPT Presentation

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan Azure Big Compute Azure Infrastructure Commodity, most Fast processors, Fast


  1. High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan

  2. Azure Big Compute

  3. Azure Infrastructure Commodity, most Fast processors, Fast processors, Most memory, Intel HPC/Low Latency GPU enabled value for cost higher memory-to- lower-memory to Xeon processors VMs for compute VMs for core ratio, SSDs core ratio, SSDs intensive workloads Visualization/ Compute

  4. What is Batch? Many individual tasks APP Many computers/VMs Tasks are assigned to computers/VMs

  5. Scenarios • A quant back-testing portfolio strategies • A data scientist optimizing their model & parameter tuning • A life-science researcher doing genome sequencing

  6. What do they have in common? • Scale – computationally expensive work - need to scale up in order to get results back quickly • Minimal IT Management – the user is the domain specialist, not an IT specialist • Elastic compute – temporary need for a lot of capacity • Cost effective – low cost strategies are important! + They are all probably using R…

  7. doAzureParallel is... A R package that uses Azure as a parallel-backend for popular open source tools to use – foreach, caret, dplyr, etc.

  8. Foreach using doAzureParallel foreach (i = 1:100) %dopar% { myParallelAlgorithm(...) } Microsoft Azure

  9. doAzureParallel on Azure Batch Azure Batch is a platform service that provides easy job scheduling and cluster management, allowing applications or algorithms to run in parallel at scale. • Capacity on demand; jobs on demand • Autoscale (more on this later) • Minimal cluster management (node failure, install, etc) • Hardware choice – use any VM size • Pay by the minute • Cost effective – no charge for using it, you only pay for the VMs • More cost effective – low priority VMs (more on this later) If you want to run jobs using elastic compute, Batch is a great fit!

  10. Scale • From 1 to 10,000 VMs for a cluster • From 1 to millions of tasks • Your selection of hardware: • General compute VMs (A-Series / D-Series) • Memory / storage optimized (G-Series) • Compute Optimized (F-Series) • GPU enabled (N-Series) • Results from computing the mandelbrot set when scaling up: 10 parallel 20 parallel Local 5 parallel workers workers machine workers

  11. Minimal Cluster Management • Abstract away complex Azure/cloud concepts • Zero IT-level management • Work entirely in R Studio • Monitor / Debug your jobs directly in R studio • Manage your cluster and multiple jobs directly in R studio • The results of your distributed, large scale work can be returned directly to your R session

  12. Minimal code change • Minimal code change to use doAzureParallel • Easy to use and you can get started in just a few lines of code

  13. Elastic Compute • Compute on-demand • Create/delete your cluster as you need • Autoscaling pool = maximizing cloud elasticity • Long running batch jobs / overnight • Daily scheduled work – pre-provision cluster so its ready for you at the beginning of the day • Bursty work

  14. Cost Effective • Low-Priority = (extremely) Low Costs • Provisioning VMs from Azure’s surplus capacity at 80% discount • Your Azure cluster can contain both regular (dedicated) VMs and low-priority VMs Azure Batch Low Priority VMs Dedicated VMs at up to 80% discount My Local R Session

  15. Cost Effective: More about Low Priority When should I use it? • Long running work that can be broken into smaller pieces and work that doesn't have a strict time limit to complete • Experimentation, testing, evaluating models What you need to know when using it: • Possibility that Azure • will not allocate your VMs OR • that it will take some or all of the capacity back • If a node is pre-empted • Azure Batch will replace your node for you • Azure Batch will reschedule your work so that you job can successfully complete

  16. Low Priority Scenarios Preempted Dedicated Low-priority Lower cost Lower cost + + maintaining capacity w/ autoscale Lowest Cost guaranteed baseline capacity Azure Batch Pool Azure Batch Pool Azure Batch Pool Capacity Capacity Capacity Time Time Time

  17. Questions? www.github.com/azure/doazureparallel https://aka.ms/earl2017

  18. What’s new with doAzureParallel? • Low priority support a • Richer Job Management experience a • Resource Files to preload data a • Parameter Tuning integration with Caret a • Simple connector to Azure Blob Storage a

  19. R + Azure Batch So what R workloads work great on Azure Batch? • Simulation based work (VaR calculation, back-testing, monte-carlo simulations, financial modelling) • Parameter Tuning / Model Evaluation (grid search, random search, cross validation, etc) • Computing against data / ETL jobs / Data-prep jobs What industries / verticals might be interested in using this? • Financial Services • Education & Research • Sports analytics

  20. doAzureParallel (since initial release) • Initial release in March • Grass roots strategy • End-user focused • Financial Services targeted / key messaging has been around simulation based work • Interest from the field • Feedback

  21. Azure Batch Low Priority VMs Dedicated VMs at up to 80% discount My Local R Session

Recommend


More recommend