better integration of systems management hardware with
play

Better Integration of Systems Management Hardware with Linux - PowerPoint PPT Presentation

Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc. Agenda Introduction Systems Management Hardware/Software Information Available to the Service


  1. Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc.

  2. Agenda Introduction • – Systems Management Hardware/Software – Information Available to the Service Processor The Need for Better Integration • – Integration of the Service Processor with Linux – Managing Servers In-band and Out-of-band Current State • – IPMI – Exchange of information between OS and Service Processor – System Recovery/Debug – SNMP Redirection – USB NIC Pass-through – Server Health Future Features • – OS Event logging in Service Processor – Aid with Diagnostic/Debugging – Automatic Configuration of console redirection 2

  3. Introduction 3

  4. Systems Management Hardware/Software Systems Management Hardware on Server systems: • – Helps manage, monitor, update and deploy Servers. – Provides remote management and configuration options. – Independent of the presence and status of the Operating System. – Referred to as Service Processor/Baseboard Management Controller (BMC) Interfaces/API • – IPMI – CIM – WSMAN – SSH – SNMP – Telnet – VNC – Web UI 4

  5. Information Available in the Service Processor Server Hardware • – CPU – RAM – Storage/RAID Controller – NIC – Convergent Network Adapter/Fibre Channel Server Firmware • – BIOS – Service Processor – NIC, Storage Controller Server Software • – NIC IP, drivers 5

  6. The need for better Integration 6

  7. Integration of the Service Processor with Linux Servers can be managed: • – Over the systems management interface (IPMI, CIM, SNMP) – Out-of-band . – Over the OS’s network interface (SNMP, CIM, etc.) – In-band . In-band or out-of-band should not result in loss of Operating System • information/functionality. Server OS information should be available in the Service Processor. Hardware • Service Processor Service processor information should be available in the OS. • Eliminate the need for any proprietary agents on the OS. • Utilize OS to Service Processor Pass-through network. • – LAN On Motherboard. – Virtual USB NIC. In-band Out-of-band Security Considerations. • 7

  8. Managing Servers In-band and Out-of-band Operating System Server Hardware Service Processor Operating System Server Hardware Service Processor Operating System Server Hardware Service Processor Management Console In-band Managed Servers Out-of-band 8

  9. Current Status 9

  10. IPMI IPMI kernel module Autoload Older systems required OpenIPMI’s startup script • to load ipmi kernel modules Kernel 3.10 and later will autoload ipmi modules • – ipmi_devintf – Ipmi_si – Ipmi_msghandler Simplifies IPMI’s use in installation/livecd • environments ipmi_watchdog does not yet load automatically • – TODO: autoload ipmi_watchdog 10

  11. Exchange Information between OS and Service Processor What OS is running on a server? • What is the Service processor’s IP/URL? • OS information is set in the Service • Processor – System Host Name – Operating System – Operating System Version Service Processor’s IP/URL is exported to • the OS /etc/init.d/exchange-bmc-os-info • – ipmitool/contrib 11

  12. System Recovery/Debug On OS lock-up, capture information that can • aid with debugging. Watchdog timer facility provided by the • Service Processor Unlike the Chipset Watchdog (iTCO), does • more than just resetting the system. – Record failure in Sensor Event Log – Send alerts over SNMP/SMS/Phone, etc. – Capture VGA as a JPEG, Capture Video. 12

  13. System Recovery/Debug IPMI driver has had support to detect/log kernel • panic events for years. Linux Watchdog API: ipmi_watchdog.ko • – /dev/watchdog interface to the Service Processor. – watchdog pings converted to KCS messages to BMC. – Traditionally required agents in OS to send KCS messages to BMC. – Watchdogd or Systemd can act as watchdog daemons in the OS. Can co-exist/supplement kdump/kexec, requires • some guess work. TODO: Update ipmi_watchdog.ko to support • multi-watchdog. 13

  14. SNMP Redirection Service Processor has exhaustive Hardware information. • OS contains information for resources it manages. • Management Console: Many Management Consoles communicate with OS’s SNMP SNMP get/set TRAP • agent. Hardware health/inventory information available to OS is • limited/non-exhaustive. Operating System Server SNMP proxy Service Processor’s OID is grafted as part of the OS’s SNMP • Hardware MIB. TRAP forward Service Processor Traps from Service Processor can be configured to reach the • network’s Trap Sink. Hardware Health is now available to management console. • Support SNMP v2 and v3. • 14

  15. SNMP Redirection – Operation Get/Set Enable SNMP on the Service Processor • “proxy” get/set SNMP requests to the Service • Processor’s IP for a subset of OID SNMPv2-SMI::enterprises.674.10892 • Trap Enable snmptrapd to accept traps from Service • Processor’s IP. “forward” traps to sink configured on the host. • Enable SNMP Alerting on Service Processor • ipmitool-1.8.15 • – contrib/bmc-snmp-proxy 15

  16. USB NIC Pass-Through Dedicated channel for OS – Service Processor communication Operating System • Service Processor at 169.254.0.1 (default). Non-routable. • Server USB Automatic configuration with Avahi and nss-mdns or • Hardware NIC NetworkManager. Service processor can be reached with “idrac.local” • – http://idrac.local Service Processor – # ipmitool –I lan –H idrac.local – # snmpget idrac.local 16

  17. System Health Health Health of CPU, Fan, Temp, Voltages, etc. available already • Aggregate the above into “System Health” machine readable • Operating System value. Server Available in-band and/or out-of-band • Hardware Can be used by cluster software, virtualization managers, cloud Service Processor • compute managers to perform workload migration decisions Available over SNMP or IPMI • Health SNMP redirection can make health available in-band • 17

  18. System Health over IPMI and SNMP IPMI SNMP • • – raw 0x30 0x51 – SNMPv2-SMI::enterprises.674.10892.5.2.2.0 Byte 5: Global and Storage status • – 1: other -- the is not one of the below. – Bit 0- Set = Storage status Normal – 2: unknown -- not known or monitored. – Bit 1- Set = Storage status Error (non-critical) – 3: ok -- the status is ok. – Bit 2- Set = Storage status Failed (critical) – 4: nonCritical -- the status is warning, non- – Bit 3- Set = Storage status Unknown critical. – Bit 4- Set = Global status Normal – 5: critical -- the status is critical (failure). – Bit 5- Set = Global status Error (non-critical) – 6: nonRecoverable -- the status is non- – Bit 6- Set = Global status Failed (critical) recoverable (dead). – Bit 7- Set = Global status Unknown 18

  19. Opportunities… 19

  20. OS event logging in Service Processor Log OS Events to the Service Processor to have a better understanding of the host OS: • – OS Started – OS Stopped – OS Install Started – OS Install Stopped – OS Install Aborted – OS Install Failed Standard IPMI Sensor Events • Combined with OS Name, OS Version and Power Status information, this will help • administrators/console software on server state. SUSE’s YaST2 Hooks • 20

  21. Aid with Debugging OS configuration and logs crucial for • debugging Logs might be unavailable if system has • locked-up or there was a Kernel Panic. On application/kernel error: Collect relevant configuration and logs. • Store in Service Processor. • Accessible out-of-band even with host OS • down. 21

  22. Automatic Configuration of Console Redirection Most headless servers use IPMI Serial Over LAN to access remote server’s console. • BIOS contains options to setup redirection to serial console. • Administrator has to duplicate BIOS setup information on kernel command line. • – console=ttyS0,115200 Can reduce overhead if kernel can read BIOS serial port information. • ACPI already has SPCR – Serial Port Console Redirection. • Linux support was introduced in 2.4 and removed in 2.5. • Would be nice to have something similar. • 22

  23. References IPMI on Linux • – http://openipmi.sourceforge.net/IPMI.pdf – http://ipmitool.sourceforge.net/ – http://www.gnu.org/software/freeipmi/ Related Projects • – http://www.openlmi.org/ – https://github.com/abrt/abrt/wiki/ABRT-Project Scripts • – Exchange Information – http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/exchange-bmc-os-info.init.redhat – SNMP Redirection – http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/bmc-snmp-proxy – Installer Status Event logging – http://sourceforge.net/p/ipmitool/patches/97/ – Fedora Feature Page – http://fedoraproject.org/wiki/Features/AgentFreeManagement Dell iDRAC • – http://en.community.dell.com/techcenter/systems-management/w/wiki/3204.dell-remote-access-controller-drac-idrac.aspx 23

  24. Thank You! charles_rose@dell.com • linux-poweredge@dell.com • 24

  25. Backup 25

  26. Server Block Diagram 26

Recommend


More recommend