nsf future of high performance computing
play

NSF Future of High Performance Computing Bill Kramer NSF Workshop - PowerPoint PPT Presentation

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High Performance Computing Washington DC December 4, 2009 Why Sustained Performance is the Critical Focus Rela#onship between Peak,


  1. NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

  2. Why Sustained Performance is the Critical Focus Rela#onship ¡between ¡Peak, ¡ • Memory Wall Linpack ¡and ¡Sustained ¡ • Limitation on computation speed caused by the growing disparity between processor Performance ¡Using ¡SSP ¡ speed and memory latency and bandwidth 400 ¡ • From 1986 to 2000, processor speed 350 ¡ increased at an annual rate of 55%, while 300 ¡ 250 ¡ memory speed improved by only 10% per TF/s ¡ Peak ¡(TF) ¡ 200 ¡ year 150 ¡ Linpack ¡(TF) ¡ • Issue 100 ¡ Normalized ¡SSP ¡(TF) ¡ 50 ¡ • Memory latency and bandwidth limitations 0 ¡ within processor make it difficult to achieve 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡ major fraction of peak performance of chip Ra#o ¡Linpack/SSP ¡for ¡ • Latency and bandwidth limitations of NERSC ¡Systems ¡ communication fabric make it difficult to scale science and engineering applications to large ¡20.00 ¡ ¡ numbers of processors ¡15.00 ¡ ¡ ¡10.00 ¡ ¡ Ra-o ¡ ¡5.00 ¡ ¡ Linpack/SSP ¡ for ¡NERSC ¡ ¡-­‑ ¡ ¡ ¡ ¡ Systems ¡ 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡ NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 2

  3. Recommendation • Adopt a longer term focus, rather than the three to 5 year focus, which is really just the useful lifetime of a single system. • Achieving and using an Exascale systems, or the equivalent of 10s of 100 Petascale systems, will span 15 years and a progression of resource deployments. • NSF will be well served to create a 15 year funding program the combines the total cost of acquiring, supporting and using the resources. • This strategy should include creating a supporting facility infrastructure that allow efficient technology refresh to be quickly deployed and integrated with the existing resources. • To enable effective resource insertion, NSF should separate the selection of organizations that provision and support HPC resources from the resource selection itself. • The current NSF practice of issuing separate solicitations that combine an organization as a service provider and a sole system choice for each resource refreshment leads to sub-optimization that can result in neither the most effective organization nor the best value technology. • Focus on true application sustained performance. • Using something like “Sustained System Performance” to determine the best value resource solutions will enable NSF to have the most cost effective computing environments for the computational science communities. • The use of state of the practice open, best value procurements that enable comparing technology choices on sustained performance but allow vendors flexibility. • NSF should take the lead in redefining the debate – away from simple metrics and TOP500 and towards meaningful measures for science. NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 3

  4. Recommendation • NSF should follow the industry trend to concentrate its computational and data storages resources at a few locations that can then make long term investments that are amortized over a series of technology refreshments. • These locations should be determined by the ability of the organization(s) to manage large scale, early release systems, support an evolving computational science community, cost effective extreme scale infrastructure, ability to attract and engage to world class computer science and computational science staff. • The NSF should develop an appropriate balance of ‘production quality’ and ‘experimental’ resources. • Production quality means systems from well known architectures (albeit they may be early deliver versions of new generations) with proven Performance, Effectiveness, Reliability, Consistency and Usability for the primary mission of use by for computational science. • “Experimental’ resources are those that have potential to be disruptive technology leading to significant (~10x) performance and/or price performance improvements. • The mission of these types of systems is clearly different and would have different missions. • A typical investment strategy might be 85% production/15% experimental. • NSF should establish a “best practice” review of both US fund resources, and international funding programs. • NSF should invest in “performance based design” for all application areas. NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 4

  5. Geographic Distribution of PRACs Leaders NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

  6. Recommendation • NSF should separate the provisioning of a national science network from mid-ware software and/or compute and storage resource provisioning. • A national science network that serves the extreme scale computational data resources, major communities of computational and data scientists, major observational and experimental resources needs a long term roadmap that has consistent funding and a plan to technology insertion. A model for such a plan can be found in the DOE’s ESnet program among others. • NSF should likewise have a sustained program for distributed (aka cloud) middle ware software creation and support. • This support needs to be synchronized with the computational, data and networking components of the NSF strategy, but needs to be an independent program component. • NSF should support expanded development and evolution of extreme scale system software aligned with the IESP roadmap. • There are contract arrangements that can assure both high quality systems and services and innovation and advanced technology in whatever balance NSF needs. • Performance and Rewards based contracts Deployment Project Management and On-going operational assessments ala ITIL • • Example - agreement 6 year base term, renewable for up to a total of 16 years • Automatic and well as discretionary extensions that benefit both NSF and providing organizations NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 6

  7. ADDITIONAL SLIDES NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 7

  8. A Generalized Sustained System Performance (SSP) Framework • Is an effective and flexible way to evaluate systems • Determined the Sustained System Performance for each phase of each system 1. Establish a set of performance tests that reflect the intended work the system will do • Can be any number of tests as long as they have a common measure of performance 2. A test consists of a code and a problem set 3. Establish the amount work (ops) the test needs to do for a fixed concurrency or a fixed problem set 4. Time each test execution – use wall clock time 5. Determine the amount of work done for a given scalable unit (node, socket, core, task, thread, interface, etc.) • Work = Total operations/total time/number of scalable units used for the test 6. Composite the work per scalable unit for all tests • Composite functions based on circumstances and test selection criteria • Can be weighed or not as desired 7. Determine the SSP of a system at any time period by multiplying the composite work per scalable unit by the number of scalable units in the system NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 12/4/09 8

  9. Examples of Using the (SSP) Framework • Test a system upon delivery, use to select a system, etc. • Determine the Potency of the system - how well will the system perform the expected work over some time period • Potency is the sum, over the specified time, of the product of a system’s SSP and the time period of that SSP over some time period • Different SSPs for different periods • Different SSPs for different types of computation units (heterogeneous) • Determine the Cost of systems • Cost can be any resource units ($, Watts, space…) and with any complexity (Initial, TCO,…) • Determine the Value of the system • Value is the potency divide by a cost function • If needed, compare the value of different system alternatives or compare against expectations NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 12/4/09 9

Recommend


More recommend