HPC Environment Management: New Challenges in the Petaflop Era Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR’10
Agenda 1. Introduction 2. Available Tools 1. Deployment 2. Monitoring 3. Proprietary Solutions 3. LEMMing Project 4. Conclusion HPC Environment Management: New 6/24/2010 2 Challenges in the Petaflop Era
Introduction • High Performance Computing Systems – Universities – Research centers – Experiments, simulations – Industry Sector • 62.4% (11/09 top500.org list) • Petaflop barrier HPC Environment Management: New 6/24/2010 3 Challenges in the Petaflop Era
Growth of processors per system
Management and Monitoring Tools • Systems with many processors • Organized information – List of nodes – Hierarchical approach • Usability – Expert and non ‐ expert managers
Managing a Supercomputer • Grant secure access • Quick handling defects and problems • Offer a queue system • Use some monitoring tools • Support non uniform infrastructure • Integrate with local tools
Administrate a HPC center • Proprietary Software – Integration – Usability • An open source proposal – LEMMing • Single point of management • RIA • Customization
Available Tools • Deployment Tools – OSCAR – ROCKS – xCAT • Monitoring Tools – Cacti – Ganglia – Nagios HPC Environment Management: New 6/24/2010 8 Challenges in the Petaflop Era
The Deployment • Should be easy – GUI – Out of the box installation • Integrate management features – Node adding and removal – Changes in properties • Basic HPC Tools – MPI – Queue system • Offer monitoring tools HPC Environment Management: New 6/24/2010 9 Challenges in the Petaflop Era
Comparison Cluster Queuing Monitorin Node Adding MPI Installation System g Tool GUI + Network OSCAR GUI Yes Yes Yes listening UI + Network Rocks GUI Yes Yes Yes listening Command Command Line + xCAT No No No Line Manual Adding HPC Environment Management: New 6/24/2010 10 Challenges in the Petaflop Era
Monitoring Tools • Web based – Easy access • Rich Internet Application • Alert sending • Customizable – Plug ‐ ins • The monitoring focus HPC Environment Management: New 6/24/2010 11 Challenges in the Petaflop Era
Comparison Monitoring Web Based RIA Send Alert Plugins focus Cacti Yes No No Yes Network Ganglia Yes No No No Cluster/Grid Nagios Yes No Yes Yes Network HPC Environment Management: New 6/24/2010 12 Challenges in the Petaflop Era
Proprietary solutions • Usually use some open source apps – OSCAR, Rocks, xCAT, Ganglia, Nagios, Cacti… • Tune the cluster configuration • Proprietary tools for administration – Hardware specific • Poor integration – Different vendors HPC Environment Management: New 6/24/2010 13 Challenges in the Petaflop Era
Challenges to a HPC environment • Increasing number of processors • Heterogeneous environments – Resources from different machines • Particular/Local tools • Administrators with different level of knowledge • Present available resources as a whole – Organized and customized HPC Environment Management: New 6/24/2010 14 Challenges in the Petaflop Era
• Node naming and organization – Simple form node 0 node 5 node 1 node 6 node 2 node 7 node 3 node 8 node 4 An example Expansion node 0 node 5 node 1 node 6 node 2 node 7 node 3 node 8 node 11 node 4 node 12 node 9 node 10
• Node naming and organization – Hierarchical approach c0n0 c1n0 c0n1 c1n1 c0n2 c1n2 c0n3 c1n3 c0n4 An example Expansion c0n0 c1n0 c0n1 c1n1 c0n2 c1n2 c0n3 c1n3 c0n4 c1n4 c0n5 c1n5 c0n6
LEMMing Project • Inspired on Zimbra Collaboration Suite • Use Open Source tools • Use AJAX technologies • LEMMing is not an extension – Less dependent – Great usability HPC Environment Management: New 6/24/2010 17 Challenges in the Petaflop Era
What is LEMMing? • L inux E nterprise M anagement and M onitor ing • Cluster with thousands of nodes – Many failures • Flexibility • Easiness to add features • Great usability – Detect and solve problems faster HPC Environment Management: New 6/24/2010 18 Challenges in the Petaflop Era
Features • Being freeware • Web Service based • AJAX interface design • Integration of other tools • Single point of management • Tested with Rocks clusters • Support for many cluster topologies organization • Integrated with workload management • Parallel shell tools • Customizable Dashboard HPC Environment Management: New 6/24/2010 19 Challenges in the Petaflop Era
LEMMing Modules • LEMM ‐ WS – Web Services – Coupled to the supercomputer – API • LEMM ‐ GATE – Web application – Independent of the cluster HPC Environment Management: New 6/24/2010 20 Challenges in the Petaflop Era
LEMMing Modules Relationship HPC Environment Management: New 6/24/2010 21 Challenges in the Petaflop Era
LEMM ‐ GATE interface HPC Environment Management: New 6/24/2010 22 Challenges in the Petaflop Era
LEMM ‐ GATE interface HPC Environment Management: New 6/24/2010 23 Challenges in the Petaflop Era
Conclusion • Huge HPC centers – Heterogeneous machines – Many nodes per cluster • LEMMing – Integrate multiple clusters management and monitoring software stack – Rich internet application – Open Source model – Use of available tools HPC Environment Management: New 6/24/2010 24 Challenges in the Petaflop Era
Future Work • Add support to different cluster systems • IPMI support • Queue management • Visist us: – http://lemm.sf.net – Check the video demonstration HPC Environment Management: New 6/24/2010 25 Challenges in the Petaflop Era
Acknowledgments • The author thanks: – High Performance Computing Center – Professor Alvaro Coutinho – DELL Brazil HPC Environment Management: New 6/24/2010 26 Challenges in the Petaflop Era
HPC Environment Management: New Challenges in the Petaflop Era Thanks! Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR’10
Recommend
More recommend