hpc environment management new challenges in the petaflop
play

HPC Environment Management: New Challenges in the Petaflop Era - PowerPoint PPT Presentation

HPC Environment Management: New Challenges in the Petaflop Era Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR10 Agenda 1. Introduction 2. Available Tools 1. Deployment 2. Monitoring 3. Proprietary Solutions


  1. HPC Environment Management: New Challenges in the Petaflop Era Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR’10

  2. Agenda 1. Introduction 2. Available Tools 1. Deployment 2. Monitoring 3. Proprietary Solutions 3. LEMMing Project 4. Conclusion HPC Environment Management: New 6/24/2010 2 Challenges in the Petaflop Era

  3. Introduction • High Performance Computing Systems – Universities – Research centers – Experiments, simulations – Industry Sector • 62.4% (11/09 top500.org list) • Petaflop barrier HPC Environment Management: New 6/24/2010 3 Challenges in the Petaflop Era

  4. Growth of processors per system

  5. Management and Monitoring Tools • Systems with many processors • Organized information – List of nodes – Hierarchical approach • Usability – Expert and non ‐ expert managers

  6. Managing a Supercomputer • Grant secure access • Quick handling defects and problems • Offer a queue system • Use some monitoring tools • Support non uniform infrastructure • Integrate with local tools

  7. Administrate a HPC center • Proprietary Software – Integration – Usability • An open source proposal – LEMMing • Single point of management • RIA • Customization

  8. Available Tools • Deployment Tools – OSCAR – ROCKS – xCAT • Monitoring Tools – Cacti – Ganglia – Nagios HPC Environment Management: New 6/24/2010 8 Challenges in the Petaflop Era

  9. The Deployment • Should be easy – GUI – Out of the box installation • Integrate management features – Node adding and removal – Changes in properties • Basic HPC Tools – MPI – Queue system • Offer monitoring tools HPC Environment Management: New 6/24/2010 9 Challenges in the Petaflop Era

  10. Comparison Cluster Queuing Monitorin Node Adding MPI Installation System g Tool GUI + Network OSCAR GUI Yes Yes Yes listening UI + Network Rocks GUI Yes Yes Yes listening Command Command Line + xCAT No No No Line Manual Adding HPC Environment Management: New 6/24/2010 10 Challenges in the Petaflop Era

  11. Monitoring Tools • Web based – Easy access • Rich Internet Application • Alert sending • Customizable – Plug ‐ ins • The monitoring focus HPC Environment Management: New 6/24/2010 11 Challenges in the Petaflop Era

  12. Comparison Monitoring Web Based RIA Send Alert Plugins focus Cacti Yes No No Yes Network Ganglia Yes No No No Cluster/Grid Nagios Yes No Yes Yes Network HPC Environment Management: New 6/24/2010 12 Challenges in the Petaflop Era

  13. Proprietary solutions • Usually use some open source apps – OSCAR, Rocks, xCAT, Ganglia, Nagios, Cacti… • Tune the cluster configuration • Proprietary tools for administration – Hardware specific • Poor integration – Different vendors HPC Environment Management: New 6/24/2010 13 Challenges in the Petaflop Era

  14. Challenges to a HPC environment • Increasing number of processors • Heterogeneous environments – Resources from different machines • Particular/Local tools • Administrators with different level of knowledge • Present available resources as a whole – Organized and customized HPC Environment Management: New 6/24/2010 14 Challenges in the Petaflop Era

  15. • Node naming and organization – Simple form node 0 node 5 node 1 node 6 node 2 node 7 node 3 node 8 node 4 An example Expansion node 0 node 5 node 1 node 6 node 2 node 7 node 3 node 8 node 11 node 4 node 12 node 9 node 10

  16. • Node naming and organization – Hierarchical approach c0n0 c1n0 c0n1 c1n1 c0n2 c1n2 c0n3 c1n3 c0n4 An example Expansion c0n0 c1n0 c0n1 c1n1 c0n2 c1n2 c0n3 c1n3 c0n4 c1n4 c0n5 c1n5 c0n6

  17. LEMMing Project • Inspired on Zimbra Collaboration Suite • Use Open Source tools • Use AJAX technologies • LEMMing is not an extension – Less dependent – Great usability HPC Environment Management: New 6/24/2010 17 Challenges in the Petaflop Era

  18. What is LEMMing? • L inux E nterprise M anagement and M onitor ing • Cluster with thousands of nodes – Many failures • Flexibility • Easiness to add features • Great usability – Detect and solve problems faster HPC Environment Management: New 6/24/2010 18 Challenges in the Petaflop Era

  19. Features • Being freeware • Web Service based • AJAX interface design • Integration of other tools • Single point of management • Tested with Rocks clusters • Support for many cluster topologies organization • Integrated with workload management • Parallel shell tools • Customizable Dashboard HPC Environment Management: New 6/24/2010 19 Challenges in the Petaflop Era

  20. LEMMing Modules • LEMM ‐ WS – Web Services – Coupled to the supercomputer – API • LEMM ‐ GATE – Web application – Independent of the cluster HPC Environment Management: New 6/24/2010 20 Challenges in the Petaflop Era

  21. LEMMing Modules Relationship HPC Environment Management: New 6/24/2010 21 Challenges in the Petaflop Era

  22. LEMM ‐ GATE interface HPC Environment Management: New 6/24/2010 22 Challenges in the Petaflop Era

  23. LEMM ‐ GATE interface HPC Environment Management: New 6/24/2010 23 Challenges in the Petaflop Era

  24. Conclusion • Huge HPC centers – Heterogeneous machines – Many nodes per cluster • LEMMing – Integrate multiple clusters management and monitoring software stack – Rich internet application – Open Source model – Use of available tools HPC Environment Management: New 6/24/2010 24 Challenges in the Petaflop Era

  25. Future Work • Add support to different cluster systems • IPMI support • Queue management • Visist us: – http://lemm.sf.net – Check the video demonstration HPC Environment Management: New 6/24/2010 25 Challenges in the Petaflop Era

  26. Acknowledgments • The author thanks: – High Performance Computing Center – Professor Alvaro Coutinho – DELL Brazil HPC Environment Management: New 6/24/2010 26 Challenges in the Petaflop Era

  27. HPC Environment Management: New Challenges in the Petaflop Era Thanks! Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR’10

Recommend


More recommend