toward specialized serial machines
play

Toward Specialized Serial Machines as TeraGrid Resources Arvind - PowerPoint PPT Presentation

Survey of TeraGrid Job Distribution: Toward Specialized Serial Machines as TeraGrid Resources Arvind Gopu, Richard Repasky, Scott McCaulay Indiana University June 5 th 2007 Introduction Proceeding toward peta-scale computing On the


  1. Survey of TeraGrid Job Distribution: Toward Specialized Serial Machines as TeraGrid Resources Arvind Gopu, Richard Repasky, Scott McCaulay Indiana University June 5 th 2007

  2. Introduction  Proceeding toward peta-scale computing – On the TeraGrid: Massive parallel machines with high speed low latency (HSLL) networking – More to follow?  HSLL networking gear: – Expensive – most often 1/3 rd of system cost – Additional technical skill set required to build and maintain Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 2

  3. Introduction (contd…)  Considerable user base that run serial and coarse-grained parallel code – Big chunk of compute nodes, and expensive networking gear: unutilized  TeraGrid job distribution (Oct 2004-06)  Backfill: System utilization vs. user satisfaction  . . . More detail . . .  Conclusion: specialized serial systems or hybrid parallel machines i.e. parts of parallel systems without HSSL network? Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 3

  4. Peta-scale computing in the horizon  Aggressively moving toward peta-scale computing… NSF funded TeraGrid RP sites plus more… – Large parallel systems – great for research! Lead to many a path breaking research finding  BUT … – High Speed Low Latency networks are expensive! – Do all researchers, who do computational analyses, need large parallel machines? No! Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 4

  5. Who does not necessarily need HSLL?  Users who run: – Legacy serial applications: more likely to have longer walltime (since they don’t use multiple CPUs) – Coarse-grained parallel applications – embarrassingly parallel code, master-worker, etc. Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 5

  6. Who does not necessarily need HSLL? (contd...)  Consider 64 serial jobs, each running for 72 hours (or worse), on 4-core compute nodes – 75% of cores, i.e., 3 out of 4 cores idle (mostly) for 72 hours  64 cores active; 192 cores idle! – Optical fiber or the like (usually 1/3 rd of system cost), connecting those 64 nodes unutilized for 72 hours! – Possibly holding parallel user for 72 hours – How about 8-core nodes – even worse.  Bad scenario! Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 6

  7. When do serial and coarse-grained parallel jobs help?  When they’re short (walltime)!  Consider 1024 serial jobs (or distributed worker tasks from a parallel code), each running for 30 minutes or less, on 4-core compute nodes – 75% of cores, i.e., 3 out of 4 cores idle (mostly) at worst, for 30 minutes  Again … 64 active cores; 192 cores idle  Again … HSLL network connecting those 64 nodes unutilized  BUT only for a short period of time; – And most likely: not holding a large parallel job up  Scenario? Much better system utilization – backfill – Constant flow of such jobs is good on massive parallel systems (with HSLL) Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 7

  8. TeraGrid Job Distribution (Oct 2004-06)  Plotted job characteristics: – Number of CPUs used by each job vs. Number of jobs – Walltime for serial jobs vs. Number of jobs Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 8

  9. TeraGrid Job Distribution: Number of CPUs per job vs. Number of Jobs Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 9

  10. TeraGrid Job Distribution: Walltime for serial jobs vs. Number of Jobs Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 10

  11. TeraGrid Job Distribution (contd…)  The plots in previous slides: – Show serial vs. multi-CPU jobs (close to 60% serial) – Do not show coarse-grained vs. fine-grained parallel jobs  Hard to figure out from available logs; possible to filter at allocation stage though Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 11

  12. Job Distribution on TeraGrid vs. on IU resources  Is job distribution on TeraGrid completely reflective of computational user base? – Many legacy applications run serial (unless a web service wraps them to run differently – still a research topic) – Coarse-grained parallel applications abound – embarrassingly parallel with not much communication  For instance, on IU’s Big Red – Part of system dedicated to TeraGrid; rest for local users – Larger serial user base – Plus … (repeating what’s been mentioned before) users running embarrassingly parallel code do not need HSLL network – Continuation of trend we’ve seen on past systems Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 12

  13. Revisiting backfill  So, are serial jobs inherently bad for parallel machines? No!  Use parallel jobs with low CPU-count and walltime requirements, as well as shorter serial jobs, to fill in while scheduler accumulates nodes for large parallel job  Great to increase system utilization – But does backfill always work? Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 13

  14. When backfill is great …  Backfill is great when there are large number of short serial (or parallel) jobs. – Monte Carlo simulations – Lots and lots of really short serial jobs – Applications that distribute one big task into simultaneously short running serial/threaded tasks – Fill up nodes that are being kept idle for massive parallel job  Don’t necessarily need serial jobs; can use smaller parallel jobs to fill in as backfill (for larger parallel jobs) Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 14

  15. Increased queue wait times  Backfill does not work when…  Serial or coarse-grained parallel jobs are longer (walltime) – Long serial jobs wait in queue because scheduler is usually configured to give preference to large parallel jobs – Frustration for serial user: “Just one (or a few) long single CPU job(s), why can’t I run?” Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 15

  16. Increased queue wait times (contd…)  Another scenario – large serial job(s) sneak(s) in…  What if, a set of long (walltime) serial or coarse-grained parallel jobs sneak into running state before large parallel job arrives in job queue? – Parallel job(s) wait(s) till all or a subset of serial jobs complete – In this case,  Again 3/4 ths or more of CPUs/cores – on nodes used by aforementioned longer walltime jobs – laying around  And repeating one more time … expensive networking gear connecting those nodes also laying around idle! – Frustration for parallel user! “My job is tailor -made for this system but I am waiting because of these serial jobs!” Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 16

  17. Specialized Serial Machines?  Certain resources without HSLL network – Or certain resources with a mix of HSLL connected nodes and Ethernet connected nodes  Allocate large serial apps or long running coarse- grained parallel apps here – Still use short serial for backfill on massive parallel machine – Threshold will be based on need, and will vary over each allocation meeting  Is parallel cycles in shortage (or serial cycles)? Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 17

  18. Lower financial barrier for new RPs  Not only does having serial systems obviate wastage of expensive network gear/CPUs  Lowers financial barrier for new resource providers to join TeraGrid – Spend less on entire system! – Or, relocate funds that would have been used for HSLL networking gear toward more CPUs Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 18

  19. Training wheels for new RPs  Plus new RPs have lot on their hands: – Hooking up to TG network backbone – CTSS – Accounting and Usage (AMIE, etc.) – Myrinet and Infiniband require specialized skills to maintain… even experienced sys - admins find it challenging Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 19

  20. Single point of entry for ALL computational users  Right now, the legacy serial code user base may find it hard to get cycles on TeraGrid (unless they have some sort of threading in their code)  Have more variety in available resources: – Massive parallel systems – Systems that provide serial cycles or parallel cycle without HSLL – SMP systems  Single point of entry for all types of computational users in the US: TeraGrid – 10 Gig pipe requirement may also a barrier as of now for smaller resource providers Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 20

  21. Conclusion  Myrinet, Infiniband, etc. (high speed low latency networks) very expensive and require specialized maintenance  A large subset of users still run legacy serial applications or coarse-grained parallel applications  With specialized serial machines or hybrid parallel machines: – Allocate large serial user requests and coarse-grained parallel requests to those – Better “user experience” (especially in terms of queue wait times) for both massive parallel users and long running serial users – Lower financial barrier for new RPs and lesser learning curve for sys-admins – TeraGrid as single point of entry for computational users Gopu et al., Survey of TeraGrid Job June 5th 2007 Distribution... 21

Recommend


More recommend