CRAY SV1 SuperCluster Resiliency Mike Wolf I/O development SGI 41st Cray User Group Conference Minneapolis, Minnesota
Resiliency Goals Maintain cluster operations after a panic ¥ Ring Resiliency ¥ Auto-Recovery ¥ Failover
SuperCluster Resiliency Ring Resiliency ¥ Operating System resets client chip ¥ Check xxx commands resetting client chip ¥ Proxy locking ¥ Dring monitor
SuperCluster Resiliency Auto-Recovery ¥ Foundation / Monitoring ¥ User exits in check xxx commands ¥ Recovery ¥ Notification
SuperCluster Resiliency Failover ¥ NFS ¥ UDB ¥ DCE/DFS ¥BDS
Resiliency Example 1 SV1 SuperCluster Basic Building Block Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 1 Mainframe 1 panics Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 1 Mainframe 2 has packet backup Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 1 Mainframe 2 hangs Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 1 Mainframes 3 and 4 hang Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 2 Mainframe 1 panics Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 2 SWS stabilizes ring Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Resiliency Example 2 Mainframe 1 is back in service Mainframe 1 MPN Mainframe 2 MPN FCN MPN Mainframe 3 MPN Mainframe 4 GigaRing Ethernet SWS
Recommend
More recommend