T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS - PowerPoint PPT Presentation

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS Bilge Acun acun2@illinois.edu Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 1 Abhishek Gupta | Bilge Acun | Osman Sarood | Laxmikant Kale IEEE International Conference on High Performance Computing (HiPC) 2014

M ALLEABLE P ARALLEL J OBS ¢ Dynamic shrink/expand number of processors  Shrink : A parallel application running on nodes of set A is resized to run on nodes of set B where B ⊂ A  Expand : A parallel application running on nodes of set A is resized to run on nodes of set B, where B ⊃ A  Rescale : Shrink or expand ¢ Twofold merit  Provider perspective ¢ Better system utilization, throughput ¢ Honor job priorities  User perspective: ¢ Early response time ¢ Dynamic pricing offered by cloud providers, such as Amazon EC2 ¢ Better value for the money spent based on priorities and deadlines 2 Malleable jobs have tremendous but unrealized potential, What do we need to enable malleable HPC jobs?

C OMPONENTS OF A M ALLEABLE J OBS S YSTEM New Jobs Cluster Nodes Launch Node Monitor Job Queue Scheduler Shrink Decisions Expand Scheduling Execution Policy Engine Cluster Engine Shrink Ack. State Expand Ack. Changes Adaptive/Malleable Adaptive Adaptive Parallel Runtime Job Scheduler Resource Manager System We will focus on Malleable Parallel Runtime 3

R ELATED W ORK ¢ Prior works focus on job scheduling strategies ¢ Parallel runtime for malleable HPC jobs open problem ¢ Existing approaches  Residual processes when shrinking ¢ Charm++ malleable jobs (Kale et al.) ¢ Dynamic MPI (Cera et al.)  Too much application specific programmer effort on resize ¢ Dynamic malleability of iterative MPI applications using PCM Our focus: parallel runtime to render a job mallea ble • No residual processes • Little application-specific programming effort • Goals: Efficient, Fast, Scalable, Generic, Practical, Low-effort! 4

D EFINITIONS AND G OALS ¢ Shrink : A parallel application running on nodes of set A is resized to run on nodes of set B where B ⊂ A ¢ Expand : A parallel application running on nodes of set A is resized to run on nodes of set B, where B ⊃ A ¢ Rescale : Shrink or expand ¢ Goals:  Efficient  Fast  Scalable  Generic  Practical  Low-effort 5

A PPROACH (S HRINK ) Launcher Application Processes Tasks/Objects ¡ (Charmrun) CCS Sync. Point, Check for Shrink Shrink/Expand Request Request Object Evacuation Load Balancing Time Checkpoint to Linux shared memory Rebirth ¡( exec ) ¡ or ¡die ¡ ( exit ) ¡ Reconnect ¡protocol ¡ Restore Object from Checkpoint Execution Resumes via stored callback ShrinkAck to external client 6

A PPROACH (E XPAND ) Launcher ¡(Charmrun) ¡ Applica1on ¡Processes ¡ CCS ¡ ¡ Expand ¡ Sync. ¡Point, ¡Check ¡for ¡ Request ¡ ¡ Shrink/Expand ¡Request ¡ Checkpoint ¡to ¡linux ¡ Time ¡ shared ¡memory ¡ Rebirth ¡( exec ) ¡ or ¡ launch ¡ ( ssh, fork ) ¡ Connect ¡protocol ¡ Restore ¡Object ¡ from ¡Checkpoint ¡ Load ¡Balancing ¡ ExpandAck ¡to ¡ external ¡ ¡client ¡ ExecuDon ¡Resumes ¡ via ¡stored ¡callback ¡ 7

M ALLEABLE RTS A PPROACH S UMMARY ¢ Task/object migration  Application-transparent redistribution ¢ Checkpoint-restart  Clean restart (rebirth) ¢ Load balancing  Efficient execution after rescale ¢ Linux shared memory  Fast and persistent checkpoint ¢ Implementation atop Charm++ 8

C OMPONENTS OF A M ALLEABLE J OBS S YSTEM New Jobs Cluster Nodes Launch Node Monitor Job Queue Scheduler Shrink Decisions Expand Scheduling Execution Policy Engine Cluster Engine Shrink Ack. State Expand Ack. Changes Adaptive/Malleable Adaptive Adaptive Parallel Runtime Job Scheduler Resource Manager System 9

A DAPTIVITY IN R ESOURCE M ANAGER ¢ How and when to  Communicate scheduling decisions to parallel application  Detect success or failure of those actions ¢ Resource manager to RTS communication channel ( how ) ¢ Split phase execution of scheduling decisions ( when ) 10

E XPERIMENTAL E VALUATION ¢ Four HPC mini-applications with Charm++:  Stencil2D: 5-point stencil on a 2D grid using Jacobi relaxation  LeanMD: Mini-app version of NAMD molecular dynamics app  Wave2D : 2D mesh based mini-app for simulating wave propagation  Lulesh: Charm++ version of LULESH hydrodynamics mini-app  All experimental results are done on Stampede ¢ Evaluate against design goals 11

R ESULTS : A DAPTIVITY Low is better LeanMD: Adapting load distribution on rescale, showing that our approach is efficient 12

R ESULTS : S CALABILITY Total time Stencil2D: 24K by 24K shrink Low is better Scales well with increasing number of processors 13

R ESULTS : S CALABILITY Total time Stencil2D: 256->128 shrink Low is better 640MB per process at 96K Scales well with increasing problem size 14

R ESULTS S UMMARY ¢ Adapts load distribution well on rescale (Efficient) ¢ 2k->1k in 13s, 1k->2k in 40s (Fast) ¢ Scales well with core count and problem size (Scalable) ¢ Little application programmer effort (Low-effort)  4 mini-applications: Stencil2D, LeanMD, Wave2D, Lulesh  15-37 SLOC, For Lulesh, 0.4% of original SLOC ¢ Can be used in most supercomputers (Practical) What are the benefits of malleability? 15

A PPLICABILITY AND B ENEFITS ¢ Provider perspective  Improve utilization: malleable jobs + adaptive job scheduling  Stampede interactive mode as cluster for demonstration ¢ Non-traditional use cases  Clouds: Price-sensitive rescale in spot markets  Proactive fault tolerance 16

P ROVIDER P ERSPECTIVE : C ASE S TUDY Job 1 shrinks Reduced response time Job 5 expands Improved utilization Malleable Cluster State Reduced makespan • 5 jobs Rigid • Stencil2D, 1000 iterations each • 4-16 nodes, 16 cores per node • 16 nodes total in cluster • Dynamic Equipartitioning for malleable jobs • FCFS for rigid jobs 17 Idle nodes Time

P ROVIDER P ERSPECTIVE : C ASE S TUDY Smaller quadrilaterals are better Gap (s) between 2 rescale for same job Significant improvement in mean response time and utilization 18

B ENEFITS : N ON - TRADITIONAL USE CASES ¢ Clouds spot markets  Price-sensitive rescale over the spot instance pool ¢ Expand when the spot price falls below a threshold ¢ Shrink when it exceeds the threshold. ¢ Proactive fault tolerance  Shrink on failure imminent notice from resource manager  Expand when failed node comes back 19

S UMMARY ¢ A novel technique to enable malleability in HPC jobs ¢ Salient features: task migration, load-balancing, checkpoint-restart, and Linux shared memory. ¢ Scheduler-RTS communication and split-phase scheduling ¢ Experimental evaluation: fast, scalable, and effective ¢ Related and ongoing work:  Malleable jobs with Charm++ integrated into Torque/MOAB  “A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications” Suraj Prabhakaran et al. IPDPS’15  Adaptive Computing 20  Standardize API for malleable and evolving jobs

B ACKUP 21

R ESULTS 22

U SER P ERSPECTIVE : P RICE - SENSITIVE R ESCALE IN S POT M ARKETS ¢ Spot markets  Bidding based  Dynamic price Amazon EC2 spot price variation: cc2.8xlarge instance Jan 7, 2013 ¢ Set high bid to avoid termination (e.g. $1.25) ¢ Pay whatever the spot price or no progress ¢ Can I control the price I pay, and still make progress? ¢ Our solution: keep two pools  Static: certain minimum number of reserved instances  Dynamic: price-sensitive rescale over the spot instance pool ¢ Expand when the spot price falls below a threshold ¢ Shrink when it exceeds the threshold. 23

U SER P ERSPECTIVE : P RICE - SENSITIVE R ESCALE IN S POT M ARKETS Price Calculation No rescale: $16.65 for 24 hours Usable hours may be reduced With rescale: freedom to select price threshold Dynamic shrinking and expansion of HPC jobs can enable lower effective price in cloud spot markets 24

P ROACTIVE F AULT T OLERANCE 25

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS - PowerPoint PPT Presentation

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS Bilge Acun acun2@illinois.edu Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 1 Abhishek Gupta | Bilge Acun | Osman Sarood | Laxmikant Kale

R ealizing V ision 2030: T owaRds a P olicy s TRaTegy - a l ogisTics c enTRed e conomy G. ANTHONY

GTC M ARCH 2018 Mike Tilkin ACR Chief Information Officer and EVP for Technology R EALIZING THE

QCD with axial hemial p otential and p ola rization of dilepton p ro dution in

R EALIZING V ALUE T HROUGH ERP 5 K EY S TRATEGIES TO D RIVE B USINESS P ROCESS I MPROVEMENT P r e

2 Global Gold Production P OLICY P OTENTIAL I NDEX Country Score 1 Finland 95.5 93.6 2

L ECTURE 14: P OTENTIAL F IELDS FOR L OCAL P LANNING , N AVIGATION I NSTRUCTOR : G IANNI A. D I C

GI L ST ON ARE A CONCE PT DE VE L OPME NT F RAME WORK : T OWARDS A COMMON

Tow owards a a Com Comprehensiv ive & Ambit itious FT FTA Juli liana Nam Nam, ,

SUPERVISORS VS ECONOMIC POLICY-MAKERS T HE B EHAVIOUR OF B ANKS T OWARDS F INANCIALLY D ISTRESSED D

GDPR T owards Compliance 25 May 2018 Wha hat t is GDPR? EU Data Protection Directive EU

The Strongly Intensive Cumulants Made Simple: A Simplifjed Explanation Geared T owards Intuitive

Sensing ng everyw ywhe here re: Towa owards rds Safer and More Relia iable ble Sensor

T oo Many Knobs to Tune? T owards Faster Database Tuning by Pre-selecting Important Knobs

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi

T owards Network Containment in Malware Analysis Systems Mariano Graziano, Corrado Leita, Davide

T owards An Open Trace-based Mechanism position paper Authors: Paul Leger and ric Tanter

Furniture Committee Meeting October 2, 2018 Agenda Market Update Update on Budget and CR

Recent Schemes for X X Increasing the Quantum Z Z Fault-tolerance U L Threshold Ben

Lattice QCD analysis of charmed tetraquark candidates Yoichi Ikeda (RCNP , Osaka University) HAL

GSP Coordinating Committee Coordinating Committee Meeting August 27, 2018 Merced

CANDIS: Heterogenous Mobile Cloud Framework and Energy Cost-Aware Scheduling Sebastian Schildt,

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e

A Dynamic Near-Optimal Algorithm for Online Linear Programming Yinyu Ye Department of Management

Designing and Pricing Certificates Nima Haghpanah joint with Nageeb Ali, Xiao Lin, Ron Siegel

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS - PowerPoint PPT Presentation

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS Bilge Acun acun2@illinois.edu Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 1 Abhishek Gupta | Bilge Acun | Osman Sarood | Laxmikant Kale

R ealizing V ision 2030: T owaRds a P olicy s TRaTegy - a l ogisTics c enTRed e conomy G. ANTHONY

GTC M ARCH 2018 Mike Tilkin ACR Chief Information Officer and EVP for Technology R EALIZING THE

QCD with axial hemial p otential and p ola rization of dilepton p ro dution in

R EALIZING V ALUE T HROUGH ERP 5 K EY S TRATEGIES TO D RIVE B USINESS P ROCESS I MPROVEMENT P r e

2 Global Gold Production P OLICY P OTENTIAL I NDEX Country Score 1 Finland 95.5 93.6 2

L ECTURE 14: P OTENTIAL F IELDS FOR L OCAL P LANNING , N AVIGATION I NSTRUCTOR : G IANNI A. D I C

GI L ST ON ARE A CONCE PT DE VE L OPME NT F RAME WORK : T OWARDS A COMMON

Tow owards a a Com Comprehensiv ive &amp; Ambit itious FT FTA Juli liana Nam Nam, ,

SUPERVISORS VS ECONOMIC POLICY-MAKERS T HE B EHAVIOUR OF B ANKS T OWARDS F INANCIALLY D ISTRESSED D

GDPR T owards Compliance 25 May 2018 Wha hat t is GDPR? EU Data Protection Directive EU

The Strongly Intensive Cumulants Made Simple: A Simplifjed Explanation Geared T owards Intuitive

Sensing ng everyw ywhe here re: Towa owards rds Safer and More Relia iable ble Sensor

T oo Many Knobs to Tune? T owards Faster Database Tuning by Pre-selecting Important Knobs

T owards An Application Objective-Aware Network Interface Sangeetha Abdu Jyothi Sayed Hadi

T owards Network Containment in Malware Analysis Systems Mariano Graziano, Corrado Leita, Davide

T owards An Open Trace-based Mechanism position paper Authors: Paul Leger and ric Tanter

Furniture Committee Meeting October 2, 2018 Agenda Market Update Update on Budget and CR

Recent Schemes for X X Increasing the Quantum Z Z Fault-tolerance U L Threshold Ben

Lattice QCD analysis of charmed tetraquark candidates Yoichi Ikeda (RCNP , Osaka University) HAL

GSP Coordinating Committee Coordinating Committee Meeting August 27, 2018 Merced

CANDIS: Heterogenous Mobile Cloud Framework and Energy Cost-Aware Scheduling Sebastian Schildt,

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e

A Dynamic Near-Optimal Algorithm for Online Linear Programming Yinyu Ye Department of Management

Designing and Pricing Certificates Nima Haghpanah joint with Nageeb Ali, Xiao Lin, Ron Siegel

Tow owards a a Com Comprehensiv ive & Ambit itious FT FTA Juli liana Nam Nam, ,