monitoring systems and power5 6 lpars with ganglia
play

Monitoring Systems and POWER5/6 LPARs with Ganglia Michael Perzl - PowerPoint PPT Presentation

Monitoring Systems and POWER5/6 LPARs with Ganglia Michael Perzl michael@perzl.org Agenda Ganglia what is it ? Ganglia components and data flow An introduction to RRDTool Ganglia metrics what can be measured ? New


  1. Monitoring Systems and POWER5/6 LPARs with Ganglia Michael Perzl – michael@perzl.org

  2. Agenda  Ganglia – what is it ?  Ganglia components and data flow  An introduction to RRDTool  Ganglia metrics – what can be measured ?  New POWER5/6 metrics (AIX & Linux)  Extending Ganglia with gmetric  Add device specific information to Ganglia  Ganglia network communication  Installation issues  Where to get Ganglia for AIX and Linux on POWER ?  Best practices  Future additions / plans  Discussion  Links 2 Monitoring Systems and POWER5/6 LPARs with Ganglia

  3. Ganglia – what is it ?

  4. Ganglia – what is it ? (1/3)  Ganglia is an Open Source cluster performance monitoring tool and has been extended to include POWER5/6 features like shared processor LPARs, entitlement, physical CPU usage etc.  This session covers: – the technical details of Ganglia and the POWER5/6 extensions – how to set it up and use it to monitor all LPARs in a single machine and lots of machines 4 Monitoring Systems and POWER5/6 LPARs with Ganglia

  5. Ganglia – what is it ? (2/3) Ganglia properties:  scalable distributed monitoring system for high-performance computing systems such as clusters and grids  based on a hierarchical design targeted at federations of clusters  relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state  leverages widely used technologies such as – XML for data representation – XDR (e X ternal D ata R epresentation) for compact, portable data transport – RRDtool for data storage and visualization  uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency  robust implementation  Open Source, written in C – Downloaded 110,000+ times, 145+ countries, 500+ clusters, 2000+ nodes 5 Monitoring Systems and POWER5/6 LPARs with Ganglia

  6. Ganglia – what is it ? (3/3) Ganglia properties (cont.):  has been ported to an extensive set of operating systems and processor architectures: – AIX – Darwin – FreeBSD – HP-UX – IRIX – Linux – OSF – NetBSD – Solaris – Windows (via Cygwin)  is currently in use on over 500+ clusters around the world  has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000+ nodes – check http://ganglia.info/ for more details 6 Monitoring Systems and POWER5/6 LPARs with Ganglia

  7. Ganglia components and data flow

  8. Ganglia components The ganglia system consists of:  two unique daemons: – Ganglia Monitoring Daemon (gmond) • monitoring daemon, collects the metrics • runs on each node – Ganglia Meta Daemon (gmetad) • polls all gmond clients and stores the collected metrics in Round-Robin Databases (RRDs)  a PHP-based web frontend  a few other small utility programs – gmetric • can be used to easily extend Ganglia with additional user-defined metrics – gstat – gexec 8 Monitoring Systems and POWER5/6 LPARs with Ganglia

  9. Ganglia – Schematic View From: “Ganglia: Past, Present and Future” by Matt Massie: URL: http://ganglia.info/talks/lug_lbl_talk/ 9 Monitoring Systems and POWER5/6 LPARs with Ganglia

  10. Ganglia Architecture 10 Monitoring Systems and POWER5/6 LPARs with Ganglia

  11. Ganglia Monitoring Daemon (gmond)  G anglia Mon itoring D aemon (gmond) is a multi-threaded daemon which runs on each cluster node you want to monitor.  Installation is easy: – just the daemon and a configuration file (/etc/gmond.conf)  gmond has four main responsibilities: 1. monitor changes in host state 2. announce relevant changes 3. listen to the state of all other ganglia nodes via a unicast or multicast channel 4. answer requests for an XML description of the cluster state  Each gmond transmits information in two different ways: – unicasting or multicasting host state in external data representation (XDR) format using UDP messages – sending XML over a TCP connection 11 Monitoring Systems and POWER5/6 LPARs with Ganglia

  12. Ganglia Meta Daemon (gmetad) (1/2)  G anglia Meta D aemon (gmetad) is a daemon which typically only runs on one specific cluster node – or on more when using a staged setup.  Installation is easy: – just the daemon and a configuration file (/etc/gmetad.conf)  Federation in Ganglia is achieved using a tree of point-to-point connections amongst representative cluster nodes to aggregate the state of multiple clusters.  At each node in the tree a gmetad – periodically polls a collection of child data sources – parses the collected XML – saves all numeric volatile metrics to round-robin databases – exports the aggregated XML over a TCP socket to clients 12 Monitoring Systems and POWER5/6 LPARs with Ganglia

  13. Ganglia Meta Daemon (gmetad) (2/2)  Data sources may be either – gmond daemons, representing specific clusters or – other gmetad daemons, representing sets of clusters  Data sources use source IP addresses for access control – Multiple IP addresses can be specified for failover – The capability is natural for aggregating data from clusters since each gmond daemon contains the entire state of its cluster 13 Monitoring Systems and POWER5/6 LPARs with Ganglia

  14. Ganglia PHP web frontend (1/2) Web frontend properties:  provides a view of the gathered information via real-time dynamic web pages  displays Ganglia data in a meaningful way for system administrators and users – For example, one can view the CPU utilization over the past hour, day, week, month, or year – The web frontend shows similar graphs for memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics 14 Monitoring Systems and POWER5/6 LPARs with Ganglia

  15. Ganglia PHP web frontend (2/2) Web frontend properties (cont.):  depends on the existence of the gmetad which provides it with data from several Ganglia sources  opens the local port 8651 (by default) and expects to receive a Ganglia XML tree  the web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site – This behavior leads to a very responsive site, but requires that the full XML tree be parsed on every page access – Therefore, the Ganglia web frontend should run on a fairly powerful, dedicated machine if it presents a large amount of data  is written in the PHP scripting language and uses graphs generated by gmetad to display history information  has been tested on many flavors of Unix (primarily Linux) with the Apache web server and the PHP 4.1 module 15 Monitoring Systems and POWER5/6 LPARs with Ganglia

  16. Ganglia - data flow (1/4) One daemon per node/LPAR gmond Operating System /etc/gmond.conf performance stats API File access Network Web 16 Monitoring Systems and POWER5/6 LPARs with Ganglia

  17. Ganglia - data flow (2/4) Runs on web server One daemon per node/LPAR gmond gmetad /etc/gmetad.conf rrdtool Operating System database /etc/gmond.conf performance stats of statistics API Browser File access Network Web 17 Monitoring Systems and POWER5/6 LPARs with Ganglia

  18. Ganglia - data flow (3/4) Runs on web server One daemon per node/LPAR gmond gmetad /etc/gmetad.conf rrdtool Operating System database /etc/gmond.conf performance stats of statistics API Ganglia FE scripts Browser Apache2 File access + PHP5 Network Web 18 Monitoring Systems and POWER5/6 LPARs with Ganglia

  19. Ganglia - data flow (4/4) Runs on web server User command One daemon per node/LPAR gmetric gmond /etc/gmetad.conf gmetad rrdtool Operating System database /etc/gmond.conf performance stats of statistics API Ganglia FE scripts Browser Apache2 File access + PHP5 Network Web 19 Monitoring Systems and POWER5/6 LPARs with Ganglia

  20. Ganglia - data flow again Only one instance with the Web Server One daemon per node/LPAR /etc/gmetad.conf gmond gmetad /etc/gmond.conf rrdtool gmond /etc/gmond.conf database of statistics gmond PHP scripts /etc/gmond.conf Browser Apache2 File access + PHP5 Network Web 20 Monitoring Systems and POWER5/6 LPARs with Ganglia

  21. An introduction to RRDTool

  22. RRDTool  Homepage: http://oss.oetiker.ch/rrdtool/  RRD is the Acronym for R ound- R obin D atabase.  RRD is a system to store and display time-series data (i.e., network bandwidth, machine-room temperature, server load average).  It stores the data in a very compact way that will not expand over time ( fixed size of DB ), and it presents useful graphs by processing the data to enforce a certain data density.  It can be used either via simple wrapper scripts (from shell or Perl) or via frontends that poll network devices and put a friendly user interface on it. RRDTool is the industry standard tool to store and display time-series data! 22 Monitoring Systems and POWER5/6 LPARs with Ganglia

  23. RRDTool example graph Graph taken from http://oss.oetiker.ch/rrdtool/gallery/index.en.html Graph shows inbound and outbound call traffic going in and out of the switch via the 6 trunks connected to the Diamond exchange. Inbound traffic shown as positive and uses a lowest-free fill method. Outbound traffic shown as negative uses a distributed fill method. Tech details on RRDtrac. 23 Monitoring Systems and POWER5/6 LPARs with Ganglia

Recommend


More recommend