Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc.
Agenda Introduction • – Systems Management Hardware/Software – Information Available to the Service Processor The Need for Better Integration • – Integration of the Service Processor with Linux – Managing Servers In-band and Out-of-band Current State • – IPMI – Exchange of information between OS and Service Processor – System Recovery/Debug – SNMP Redirection – USB NIC Pass-through – Server Health Future Features • – OS Event logging in Service Processor – Aid with Diagnostic/Debugging – Automatic Configuration of console redirection 2
Introduction 3
Systems Management Hardware/Software Systems Management Hardware on Server systems: • – Helps manage, monitor, update and deploy Servers. – Provides remote management and configuration options. – Independent of the presence and status of the Operating System. – Referred to as Service Processor/Baseboard Management Controller (BMC) Interfaces/API • – IPMI – CIM – WSMAN – SSH – SNMP – Telnet – VNC – Web UI 4
Information Available in the Service Processor Server Hardware • – CPU – RAM – Storage/RAID Controller – NIC – Convergent Network Adapter/Fibre Channel Server Firmware • – BIOS – Service Processor – NIC, Storage Controller Server Software • – NIC IP, drivers 5
The need for better Integration 6
Integration of the Service Processor with Linux Servers can be managed: • – Over the systems management interface (IPMI, CIM, SNMP) – Out-of-band . – Over the OS’s network interface (SNMP, CIM, etc.) – In-band . In-band or out-of-band should not result in loss of Operating System • information/functionality. Server OS information should be available in the Service Processor. Hardware • Service Processor Service processor information should be available in the OS. • Eliminate the need for any proprietary agents on the OS. • Utilize OS to Service Processor Pass-through network. • – LAN On Motherboard. – Virtual USB NIC. In-band Out-of-band Security Considerations. • 7
Managing Servers In-band and Out-of-band Operating System Server Hardware Service Processor Operating System Server Hardware Service Processor Operating System Server Hardware Service Processor Management Console In-band Managed Servers Out-of-band 8
Current Status 9
IPMI IPMI kernel module Autoload Older systems required OpenIPMI’s startup script • to load ipmi kernel modules Kernel 3.10 and later will autoload ipmi modules • – ipmi_devintf – Ipmi_si – Ipmi_msghandler Simplifies IPMI’s use in installation/livecd • environments ipmi_watchdog does not yet load automatically • – TODO: autoload ipmi_watchdog 10
Exchange Information between OS and Service Processor What OS is running on a server? • What is the Service processor’s IP/URL? • OS information is set in the Service • Processor – System Host Name – Operating System – Operating System Version Service Processor’s IP/URL is exported to • the OS /etc/init.d/exchange-bmc-os-info • – ipmitool/contrib 11
System Recovery/Debug On OS lock-up, capture information that can • aid with debugging. Watchdog timer facility provided by the • Service Processor Unlike the Chipset Watchdog (iTCO), does • more than just resetting the system. – Record failure in Sensor Event Log – Send alerts over SNMP/SMS/Phone, etc. – Capture VGA as a JPEG, Capture Video. 12
System Recovery/Debug IPMI driver has had support to detect/log kernel • panic events for years. Linux Watchdog API: ipmi_watchdog.ko • – /dev/watchdog interface to the Service Processor. – watchdog pings converted to KCS messages to BMC. – Traditionally required agents in OS to send KCS messages to BMC. – Watchdogd or Systemd can act as watchdog daemons in the OS. Can co-exist/supplement kdump/kexec, requires • some guess work. TODO: Update ipmi_watchdog.ko to support • multi-watchdog. 13
SNMP Redirection Service Processor has exhaustive Hardware information. • OS contains information for resources it manages. • Management Console: Many Management Consoles communicate with OS’s SNMP SNMP get/set TRAP • agent. Hardware health/inventory information available to OS is • limited/non-exhaustive. Operating System Server SNMP proxy Service Processor’s OID is grafted as part of the OS’s SNMP • Hardware MIB. TRAP forward Service Processor Traps from Service Processor can be configured to reach the • network’s Trap Sink. Hardware Health is now available to management console. • Support SNMP v2 and v3. • 14
SNMP Redirection – Operation Get/Set Enable SNMP on the Service Processor • “proxy” get/set SNMP requests to the Service • Processor’s IP for a subset of OID SNMPv2-SMI::enterprises.674.10892 • Trap Enable snmptrapd to accept traps from Service • Processor’s IP. “forward” traps to sink configured on the host. • Enable SNMP Alerting on Service Processor • ipmitool-1.8.15 • – contrib/bmc-snmp-proxy 15
USB NIC Pass-Through Dedicated channel for OS – Service Processor communication Operating System • Service Processor at 169.254.0.1 (default). Non-routable. • Server USB Automatic configuration with Avahi and nss-mdns or • Hardware NIC NetworkManager. Service processor can be reached with “idrac.local” • – http://idrac.local Service Processor – # ipmitool –I lan –H idrac.local – # snmpget idrac.local 16
System Health Health Health of CPU, Fan, Temp, Voltages, etc. available already • Aggregate the above into “System Health” machine readable • Operating System value. Server Available in-band and/or out-of-band • Hardware Can be used by cluster software, virtualization managers, cloud Service Processor • compute managers to perform workload migration decisions Available over SNMP or IPMI • Health SNMP redirection can make health available in-band • 17
System Health over IPMI and SNMP IPMI SNMP • • – raw 0x30 0x51 – SNMPv2-SMI::enterprises.674.10892.5.2.2.0 Byte 5: Global and Storage status • – 1: other -- the is not one of the below. – Bit 0- Set = Storage status Normal – 2: unknown -- not known or monitored. – Bit 1- Set = Storage status Error (non-critical) – 3: ok -- the status is ok. – Bit 2- Set = Storage status Failed (critical) – 4: nonCritical -- the status is warning, non- – Bit 3- Set = Storage status Unknown critical. – Bit 4- Set = Global status Normal – 5: critical -- the status is critical (failure). – Bit 5- Set = Global status Error (non-critical) – 6: nonRecoverable -- the status is non- – Bit 6- Set = Global status Failed (critical) recoverable (dead). – Bit 7- Set = Global status Unknown 18
Opportunities… 19
OS event logging in Service Processor Log OS Events to the Service Processor to have a better understanding of the host OS: • – OS Started – OS Stopped – OS Install Started – OS Install Stopped – OS Install Aborted – OS Install Failed Standard IPMI Sensor Events • Combined with OS Name, OS Version and Power Status information, this will help • administrators/console software on server state. SUSE’s YaST2 Hooks • 20
Aid with Debugging OS configuration and logs crucial for • debugging Logs might be unavailable if system has • locked-up or there was a Kernel Panic. On application/kernel error: Collect relevant configuration and logs. • Store in Service Processor. • Accessible out-of-band even with host OS • down. 21
Automatic Configuration of Console Redirection Most headless servers use IPMI Serial Over LAN to access remote server’s console. • BIOS contains options to setup redirection to serial console. • Administrator has to duplicate BIOS setup information on kernel command line. • – console=ttyS0,115200 Can reduce overhead if kernel can read BIOS serial port information. • ACPI already has SPCR – Serial Port Console Redirection. • Linux support was introduced in 2.4 and removed in 2.5. • Would be nice to have something similar. • 22
References IPMI on Linux • – http://openipmi.sourceforge.net/IPMI.pdf – http://ipmitool.sourceforge.net/ – http://www.gnu.org/software/freeipmi/ Related Projects • – http://www.openlmi.org/ – https://github.com/abrt/abrt/wiki/ABRT-Project Scripts • – Exchange Information – http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/exchange-bmc-os-info.init.redhat – SNMP Redirection – http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/bmc-snmp-proxy – Installer Status Event logging – http://sourceforge.net/p/ipmitool/patches/97/ – Fedora Feature Page – http://fedoraproject.org/wiki/Features/AgentFreeManagement Dell iDRAC • – http://en.community.dell.com/techcenter/systems-management/w/wiki/3204.dell-remote-access-controller-drac-idrac.aspx 23
Thank You! charles_rose@dell.com • linux-poweredge@dell.com • 24
Backup 25
Server Block Diagram 26
Recommend
More recommend