Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

� Lustre Background � Why Lustre Failover ? � How does Lustre Failover work ? � Automation on the Cray XT � System configuration requirements � System configuration requirements � Software configuration for failover � Current limitations � Future work Cray Inc. - CUG 2009 2

� Server Nodes and services � MDS with one mdt per file system � OSS with one or more ost per file system � Clients maintain connections to each service � Failure detection is via network timeouts � Failure detection is via network timeouts clients MDS OSS1 mdt ost1 ost2 Cray Inc. - CUG 2009 3

� Loss of Lustre server currently requires machine reboot � Parallel file system is a critical resource for users � Decreases MTTI and increases downtime � Interrupts impact Service Level Agreements and customer satisfaction � Cray �� Customer � Cray �� Customer � Customer �� Users Cray Inc. - CUG 2009 4

� Objective is to keep the system functioning while minimizing job loss � Regain machine functionality after Lustre server death � Access same data and files by connecting to backup server � Primarily handles Lustre server death � Some documented cases of successful failover due to link failure � Depends on nature of network failure � Warm-boot of Lustre servers � Uses same recovery methods Cray Inc. - CUG 2009 5

� Lustre Failover is not able to handle RAID subsystem failures � Storage controllers � Service node HBA � Connection from service node to storage array � Solutions to these are being investigated � Solutions to these are being investigated clients MDS OSS1 mdt ost1 ost2 Cray Inc. - CUG 2009 6

� OSS1 dies � Was serving ost0, ost2 � ost0,ost2 started on OSS2 � Waits for all clients to reconnect � Client traffic to OSS1 times out Client traffic to OSS1 times out � Clients try to reconnect to OSS1, this also times out � Clients connect to OSS2 � Clients replay outstanding transactions � Clients start sending new I/O requests Cray Inc. - CUG 2009 7

clients MDS OSS1 OSS2 mdt ost0 ost2 ost1 ost3 Cray Inc. - CUG 2009 8

� Automation components � State Management � Health monitoring � Taking action � XT automation achieved through Cray-developed xt-lustre-proxy � XT automation achieved through Cray-developed xt-lustre-proxy � Runs as daemon on every Lustre server � CRMS Framework for heartbeat events � SDB for configuration and maintaining current state Cray Inc. - CUG 2009 9

� CRMS heartbeat events � Existing node failed event � sent when node stops updating heartbeat � Added new Lustre service heartbeat � Use Lustre provided /proc health check � If health check fails, proxy stops updating Lustre service heartbeat � If health check fails, proxy stops updating Lustre service heartbeat � Proxy and heartbeat � At startup, queries SDB for configuration � Registers for events for services it is backing up � On server death, proxy takes action � “Shoots” node via CRMS event to ensure it stays dead � Start services on backup server Cray Inc. - CUG 2009 10

� OSS nodes are typically configured in active/active mode � Requires storage connectivity from both nodes � MDS nodes configured in active/passive mode � Requires backup MDS on separate SIO node � If system has multiple file systems, MDS can be configured in � If system has multiple file systems, MDS can be configured in active/active mode � Hardware configuration changes � Cabling, zoning � Non-mirrored write cache turned off � OSTs per OSS limits � Failover doubles, need to ensure survivability Cray Inc. - CUG 2009 11

� Changes for FILESYSTEM.fs_defs � OSTDEV[0] =“nid00060:/dev/sda1 nid00063:/dev/sdb1” � AUTO_FAILOVER =yes � lustre_control scripts � ‘ generate_config.sh ’ will generate CSV data files for proxy configuration � ‘ generate_config.sh ’ will generate CSV data files for proxy configuration � ‘ lustre_control.sh FILESYSTEM.fs_defs write_conf’ will push CSV tables into the SDB � Manual configuration � xtfilesys2db, xtlustreserv2db, xtlustrefailover2db � xtlusfoadmin Cray Inc. - CUG 2009 12

� Failover duration is not optimal � Usually 10-15 minutes � Can take up to 30 minutes � Quotas and MDS failover � Known issues in XT 2.2, working with Sun at high priority � Known issues in XT 2.2, working with Sun at high priority � Some job loss is inevitable � Users with tight batch wall-clock limits � Client death during failover Cray Inc. - CUG 2009 13

� Manual failback � Multiple filesystems and lustre_control configuration � Documented solutions � Manual status monitoring � lctl get_param *.*.recovery_status status: RECOVERING recovery_start: 1236123918 time_remaining: 886 connected_clients: 1/178 completed_clients: 1/178 replayed_requests: 0/?? queued_requests: 0 next_transno: 1268285 Cray Inc. - CUG 2009 14

� Imperative Recovery � Working with Sun to develop feature � Force client reconnect and stop server waiting on dead clients � Reduce failover times to under 5 minutes, ideally 1 to 3 minutes � Version Based Recovery � Minimized evictions caused by unconnected clients � Minimized evictions caused by unconnected clients � Only transactions requiring missing data will fail � Adaptive Timeouts � Gemini Network � Allows shorter network timeouts and positive feedback on dead peers � Targeted for Danube Release Cray Inc. - CUG 2009 15

Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

Lustre Background Why Lustre Failover ? How does Lustre Failover work ? Automation on the Cray XT System configuration requirements System configuration requirements Software configuration for failover Current

1 A Lustre V6 tutorial Verimag December 5, 2008 - Outline Lustre Lustre V6 The Lustre V6

What a Lustre Cluster (Improving and Tracing Lustre Metadata) yaaaasss Team Saffron Amanda

The Lustre Centre of Excellence at ORNL Makia Minich Clustre Monkey, HPC Software Stack Lustre

Overview of Lustre Usage on JUROPA 26 September 2011 | Frank Heckes, FZ Jlich, JSC Lustre

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops & Stuff, LLNL) May 21, 2019

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

An Experiment With Lustre and Real-Time Calculus Introduction du cours Matthieu Moy Verimag

Multi-VO Support YAN Tian for Distributed Computing Group Meeting Oct. 23, 2014 StoRM + Lustre:

Cray Centre of Excellence for HECToR This talk is not about how to get maximum performance from

on Cray Systems Cory Spitz and Ann Koehler Cray Inc. 5/25/2011 Introduction Lustre is a

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

DSS Data & Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne

Lustre at GSI - Evaluation of a cluster file system Walter Schn, GSI Walter Schn, GSI Topic

Verifying a Lustre Compiler Part 2 Llio Brun PARKAS (Inria - ENS) Timothy Bourke,

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The

IO on Lustre and GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles

LusTRE: a Linked Thesaurus fRamework for Environment Riccardo Albertoni 1 , Monica De Martino 1 ,

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

VBS-Lustre: A Distributed Block Storage System for Cloud Infrastructure Xiaoming Gao,

A Formally Verified Compiler for Lustre Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste

Lucy-n: a n-Synchronous Extension of Lustre Louis Mandel Florence Plateau Marc Pouzet

Lustre Background Why Lustre Failover ? How does Lustre Failover - PowerPoint PPT Presentation

Lustre Background Why Lustre Failover ? How does Lustre Failover work ? Automation on the Cray XT System configuration requirements System configuration requirements Software configuration for failover Current

1 A Lustre V6 tutorial Verimag December 5, 2008 - Outline Lustre Lustre V6 The Lustre V6

What a Lustre Cluster (Improving and Tracing Lustre Metadata) yaaaasss Team Saffron Amanda

The Lustre Centre of Excellence at ORNL Makia Minich Clustre Monkey, HPC Software Stack Lustre

Overview of Lustre Usage on JUROPA 26 September 2011 | Frank Heckes, FZ Jlich, JSC Lustre

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops &amp; Stuff, LLNL) May 21, 2019

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

An Experiment With Lustre and Real-Time Calculus Introduction du cours Matthieu Moy Verimag

Multi-VO Support YAN Tian for Distributed Computing Group Meeting Oct. 23, 2014 StoRM + Lustre:

Cray Centre of Excellence for HECToR This talk is not about how to get maximum performance from

on Cray Systems Cory Spitz and Ann Koehler Cray Inc. 5/25/2011 Introduction Lustre is a

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

An Introduction to the Lustre Parallel File System Tom Edwards tedwards@cray.com C O M P U T E

DSS Data &amp; Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne

Lustre at GSI - Evaluation of a cluster file system Walter Schn, GSI Walter Schn, GSI Topic

Verifying a Lustre Compiler Part 2 Llio Brun PARKAS (Inria - ENS) Timothy Bourke,

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS David Henty and Adrian Jackson (EPCC, The

IO on Lustre and GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles

LusTRE: a Linked Thesaurus fRamework for Environment Riccardo Albertoni 1 , Monica De Martino 1 ,

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

VBS-Lustre: A Distributed Block Storage System for Cloud Infrastructure Xiaoming Gao,

A Formally Verified Compiler for Lustre Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste

Lucy-n: a n-Synchronous Extension of Lustre Louis Mandel Florence Plateau Marc Pouzet

Un-scratching Lustre MSST 2019 Cameron Harr (Lustre Ops & Stuff, LLNL) May 21, 2019

DSS Data & Storage Services CERN Lustre Evaluation and Storage Outlook Tim Bell Arne