grid computing in numerical relativity and astrophysics
play

Grid Computing in Numerical Relativity and Astrophysics Gabrielle - PDF document

Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems Cosmology


  1. Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems • Cosmology • Black Hole and Neutron Star Models • Supernovae • Astronomical Databases • Gravitational Wave Data Analysis • Drive HEC & Grids 1

  2. Gravitational Wave Physics Analysis & Insight Observations Models Complex Simulations 2

  3. Computational Science Needs Requires incredible mix of technologies & expertise! • Many scientific/engineering components • – Physics, astrophysics, CFD, engineering,... Many numerical algorithm components • – Finite difference? Finite volume? Finite elements? – Elliptic equations: multigrid, Krylov subspace,... – Mesh refinement Many different computational components • – Parallelism (HPF, MPI, PVM, ???) – Multipatch – Architecture (MPP, DSM, Vector, PC Clusters, FPGA, ???) – I/O (generate TBs/simulation, checkpointing…) – Visualization of all that comes out! New technologies • – Grid computing – Steering, data archives Such work cuts across many disciplines, areas of CS… • Cactus Code • Freely available, modular, portable and manageable environment for collaboratively developing parallel, high-performance multi- dimensional simulations • Developed for Numerical Relativity, but now general framework for parallel computing (CFD, astrophysics, climate modeling, chemical eng, quantum gravity, …) • Finite difference, adaptive mesh refinement (Carpet, Samrai, Grace), adding FE/FV, multipatch • Active user and developer communities, main development now at LSU and AEI. • Open source, documentation, etc 3

  4. Cactus Einstein ADMBase • Cactus modules (thorns) for numerical relativity. Evolve Analysis • Many additional thorns available from other groups ADM ADMAnalysis EvolSimple ADMConstraints (AEI, CCT, …) AHFinder • Agree on some basic Extract PsiKadelia principles (e.g. names of TimeGeodesic variables) and then can share InitialData Gauge Conditions evolution, analysis etc. • Can choose whether or not to IDAnalyticBH CoordGauge use e.g. gauge choice, macros, IDAxiBrillBH Maximal IDBrillData masks, matter coupling, IDLinearWaves conformal factor IDSimple • Over 100 relativity papers & SpaceMask ADMCoupling 30 student theses: ADMMacros StaticConformal production research code Grand Challenge Collaborations NASA Neutron Star NSF Black Hole Grand Grand Challenge Challenge 5 US sites 8 US Institutions • • 3 years 5 years • • Colliding neutron Attack colliding black • • star problem hole problem EU Astrophysics Network Examples of Future of Science & 10 EU sites Engineering • 3 years • Require Large Scale Simulations, • Continuing these beyond reach of any machine • problems Require Large Geo-distributed • Cross-Disciplinary Collaborations Require Grid Technologies, but not • yet using them! 4

  5. New Paradigm: Grid Computing Computational resources across • the world – Compute servers (double each 18 months) – File servers – Networks (double each 9 months) – Playstations, cell phones etc… Grid computing integrates • communities and resources How to take advantage of this for • scientific simulations? – Harness multiple sites and devices – Models with new level of complexity and scale, interacting with data – New possibilities for collaboration and advanced scenarios NLR and Louisiana Optical Network (LONI) State initiative ($40M) to support research: 40 Gbps optical network Connects 7 sites Grid resources (IBM P5) at sites LIGO/CAMD New possibilities: Dynamical provisioning and scheduling of network bandwidth Network dependent scenarios “EnLIGHTened” Computing (NSF) 5

  6. Current Grid Application Types • Community Driven Typical scenario: – Distributed communities share resources Find remote – Video Conferencing resources ( task farm, – Virtual Collaborative Environments distribute) • Data Driven Launch jobs (static) Visualize, collect – Remote access of huge data, data mining results – Eg. Gravitational wave analysis, particle physics, astronomy • Process/Simulation Driven Prototypes and demos: – Demanding Simulations of Science and need to move to: Engineering Fault tolerance – Task farming, resource brokering, Robustness distributed computations, workflow Scaling • Remote visualization, steering and Easy to use interaction, etc… Complete solutions New Paradigms for Dynamic Grids Addressing large, complex, multidisciplinary • problems with collaborative teams of varied researchers ... Code/User/Infrastructure should be aware • of environment – Discover and monitor resources available NOW – What is my allocation on these resources? – What is bandwidth/latency Code/User/Infrastructure should make decisions – Slow part of simulation can run independently … spawn it off! – New powerful resources just became available Dynamically provision … migrate there! and use new high – Machine went down … reconfigure and recover! end resources and networks – Need more memory (or less!), get by adding (dropping) machines! 6

  7. Future Dynamic Grid Computing We see something, but too weak. Please simulate to enhance signal! S S 1 S 2 P 1 P 2 S 2 S 1 P 2 P 1 Future Dynamic Grid Computing Add more resources Queue time over, Free CPUs!! find new machine RZG SDSC SDSC LRZ Clone job with Archive data steered parameter Calculate/Output Further Invariants Calculations Found a black hole, Load new component Calculate/Output Look for Grav. Waves AEI horizon Archive to LIGO experiment Find best resources NCSA 7

  8. New Grid Scenarios Intelligent Parameter Surveys, speculative computing, monte • carlo Dynamic Staging: move to faster/cheaper/bigger machine • Multiple Universe: create clone to investigate steered • parameter Automatic Component Loading: needs of process change, • discover/load/execute new calc. component on approp.machine Automatic Convergence Testing • Look Ahead: spawn off and run coarser resolution to predict • likely future Spawn Independent/Asynchronous Tasks: send to cheaper • machine, main simulation carries on Routine Profiling: best machine/queue, choose resolution • parameters based on queue Dynamic Load Balancing: inhomogeneous loads, multiple grids • Inject dynamically acquired data • But … Need Grid Apps and Programming Tools • Need application programming tools for Grid environments – Frameworks for developing Grid applications – Toolkits providing Grid functionality – Grid debuggers and profilers – Robust, dependable, flexible Grid tools • Challenging CS problems: – Missing or immature grid services – Changing environment – Different and evolving interfaces to the “grid” – Interfaces are not simple for scientific application developers • Application developers need easy, robust and dependable tools 8

  9. GridLab Project EU 5th Framework ($7M) • Partners in Europe and US • – PSNC (Poland), AEI & ZIB (Germany), VU (Netherlands), MASARYK (Czech), SZTAKI (Hungary), ISUFI (Italy), Cardiff (UK), NTUA (Greece), Chicago, ISI & Wisconsin (US), Sun, Compaq/HP, LSU Application and test bed • www.gridlab.org oriented (Cactus + Triana) – Numerical relativity – Dynamic use of grids Main goal: develop application • programming environment for Grid Grid Application Toolkit (GAT) Abstract • programming interface between applications and Grid services Designed for • applications (move file, run remote task, migrate, write to remote file) Led to GGF Simple • API for Grid Applications        Main result       from GridLab          project             www.gridlab.org/GAT            9

  10. Distributed Computation • Issues Harnessing Multiple – Bandwidth (increasing Computers faster than CPU) Why do this? – Latency – Communication needs, Capacity: computers can’t keep up with needs Topology – Communication/computation Throughput: combine resources • Techniques to be developed – Overlapping communication/computation – Extra ghost zones to reduce latency – Compression – Algorithms to do this for scientist Dynamic Adaptive Distributed Computation GigE:100MB/sec 17 4 2 2 OC-12 line 12 12 (But only 2.5MB/sec) 5 5 NCSA Origin Array SDSC IBM SP 256+128+128 1024 procs 5x12x(4+2+2) =480 5x12x17 =1020 Cactus + MPICH-G2 “Gordon Bell Prize” Communications dynamically adapt (With U. Chicago/Northern, to application and environment Supercomputing 2001, Denver) Any Cactus application Scaling: 15% -> 85% 10

Recommend


More recommend