Efficient and Scalable Operating System Provisioning with Kadeploy3 - PowerPoint PPT Presentation

Efficient and Scalable Operating System Provisioning with Kadeploy3 Luc Sarzyniec < luc.sarzyniec@inria.fr > Grid’5000

Plan 1 Introduction Use cases Challenges Key features 2 Kadeploy internals 3 Example usages at large scale 4 Conclusion

Use cases • System administration for HPC clusters ◮ Install and configure large number of nodes ◮ Manage a library of pre-configured system images ◮ Reliability of the installation process ◮ Hardware compatibility • Scientific and experimental context (Grid’5000) ◮ Launch experiments in a clean environment ◮ Custom environments (specific libraries, OS) ◮ Execute root commands • History ◮ 2001-2008: CLIC, Grenoble (kadeploy 1,2) ◮ 2008-2011: Aladdin-G5K (kadeploy 3) ◮ 2011-2013: Inria ADT Kadeploy 1 / 12

Challenges • Large scale usage (Grid’5000, production clusters) ◮ Efficiency ◮ Reliability ◮ Scalability • Different kind of usage ◮ Users: newbies → experts ◮ Command line or scripts • Ecosystem ◮ Usage of standard technologies ◮ Software/Hardware independent • Interaction with other technologies ◮ Batch scheduler ◮ Network isolation 2 / 12

Key features • Fast and reliable deployment process • Support of any kind of OS (Linux, BSD, Windows, ...) • Hardware independent • Rights management (karights) ◮ Integration with batch schedulers ◮ Users custom system images • System images library management (kaenv) • Statistics collection (kastat) • Frontend to low level tools ◮ reboot (kareboot) ◮ power on/off (kapower) ◮ serial console (kaconsole) • Simple: kadeploy -e debian-base -m node[1-42].domain.local • Scriptable deployments (client-server architecture) 3 / 12

Plan 1 Introduction 2 Kadeploy internals Boot over network Deployment process overview Automata for reliable deployment Reboot and Power operations Parallel operations File broadcast methods 3 Example usages at large scale 4 Conclusion

Boot over network • Based on PXE protocol • Standard technology, implemented by network cards • Several BIOS implementations (PXElinux, GPXElinux, iPXE) • Several methods to retrieve the kernel to boot (TFTP, HTTP) 4 / 12

Deployment process overview Kadeploy TFTP/HTTP DHCP 1. Reboot the nodes ◮ Create PXE profile files ◮ Trigger remote reboot 2. Prepare and install the nodes ◮ Boot on the minimal system ◮ Prepare nodes ◮ Send the system image ◮ Install and configure the system 3. Reboot on the installed system ◮ Update PXE and Remote reboot ◮ Nodes boot on new system 5 / 12

Automata for reliable deployment Kadeploy deployment process management: • Process split in 3 macro steps • Retries, timeout for each macro step • Split nodeset if some nodes fails • Fallback macro steps (Final reboot: Kexec → HardReboot) Min. env. setup Macrostep 1 Configure PXE profiles Wait for nodes Configure nodes Trigger reboot (partition disk, . . . ) on TFTP or HTTP server to reboot Env. intallation Macrostep 2 Broadcast system image Do post-installation to nodes customization of nodes Macrostep 3 Final reboot Trigger reboot Wait for nodes to reboot Configure PXE profiles using IPMI or SSH on deployed environment on TFTP or HTTP server 6 / 12

Reboot and Power operations • Critical part of the software • Escalation of several level of commands • Compatible with remote hardware management protocols • Administrator defined commands ◮ soft reboot: direct execution of the reboot command ◮ hard reboot: hardware remote reboot mechanism such as IPMI ◮ very hard: remote control of the power distribution unit (PDU) • Managing groups of nodes (e.g. PDU reboots) • Windowed operations (DHCP DoS, electric hazard) 7 / 12

Parallel operations Remote commands, TakTuk based • Hierarchical connections between the nodes • Adaptative work-stealing algorithm • Auto-propagation mechanism File broadcast, Kastafior based • Chain-based broadcast • Initialization of the chain with tree-based parallel command • Saturation of full-duplex networks in both directions • Other methods available: Chain, TakTuk, Bittorrent 8 / 12

File broadcast methods P2P file broadcast images server Topology aware chained file broadcast images server 9 / 12

Plan 1 Introduction 2 Kadeploy internals 3 Example usages at large scale Kadeploy on Grid’5000 Installing a cloud of VM with Kadeploy 4 Conclusion

Kadeploy on Grid’5000 Grid’5000 deployment’s statistics (since 2009) • 620 users • Total: 170,000 deployments Grid’5000 • Average: 10.3 nodes • Largest: 635 nodes (multi-site) Benchmark • 130 nodes of graphene from Nancy site • 5 deployments of a 137MB environment (Small) • 5 deployments of a 1429MB environment (Big) 10 / 12

Kadeploy on Grid’5000 Grid’5000 deployment’s statistics (since 2009) • 620 users • Total: 170,000 deployments Grid’5000 • Average: 10.3 nodes • Largest: 635 nodes (multi-site) Benchmark • 130 nodes of graphene from Nancy site • 5 deployments of a 137MB environment (Small) • 5 deployments of a 1429MB environment (Big) Deployment steps Small Big Average time in first and last reboots 3m 58s Average file broadcast/decompression time 31s 2m 6s Average deployment time 9m 36s 11m 15s 10 / 12

Installing a cloud of VM with Kadeploy Virtualized infrastructure • 4000 VMs on 635 nodes Lille (4 Grid’5000 sites) Luxembourg • 10-20 ms latency Reims • 1 single virtual cluster Nancy Virtual machines Rennes • 1 VM per core Lyon • 914MB RAM per VM (disk: 564MB, VM: 350MB) Grenoble Bordeaux • 3-18 VMs per node Toulouse Sophia Deployment results • 430MB environment • 57 minutes of deployment • 3838 nodes deployed successfully (96%) 11 / 12

Conclusion • Scalable OS provisioning for HPC clusters • Small infrastructure cost • Efficient and fail-tolerant • Stable, in production on Grid’5000 since 2009 • Actively supported and developed 12 / 12

Efficient and Scalable Operating System Provisioning with Kadeploy3 Luc Sarzyniec < luc.sarzyniec@inria.fr > Grid’5000

Efficient and Scalable Operating System Provisioning with Kadeploy3 - PowerPoint PPT Presentation

Efficient and Scalable Operating System Provisioning with Kadeploy3 Luc Sarzyniec < luc.sarzyniec@inria.fr > Grid5000 Plan 1 Introduction Use cases Challenges Key features 2 Kadeploy internals 3 Example usages at large scale 4

Resource provisioning requirements are very diverse Existing resource provisioning solutions

The QoE Provisioning-Delivery- g y -- Hysteresis and its Importance for Service Provisioning

Falconieri: Remote Provisioning Service as a Service A new, modern, open source and cloud native

Black Hat Europe 2009 Hijacking Mobile Data Connections 1 Mobile Security Lab Provisioning

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

TERENA TERENA End-to-End (E2E) Provisioning Workshop End to End (E2E) Provisioning Workshop

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Perceus A Cluster Provisioning and Management Toolkit Bill Strossman Bill Strossman What

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Module 3: Operating-System Structures System Components Operating System Services

Module 3: Operating-System Structures System Components Operating-System Services

COAPS API A Generic Cloud Application Provisioning and Management API Why COAPS ? PaaS 1 Cloud

Provisioning Hardware for Cluster Applications 2 Friday, February 17, 12 Provisioning Hardware

End-to-End Provisioning Workshop - Establishing Lightpaths - Scope and Objectives NRENs Local

Governor's Textbook 313,802 112,276 201 ,526 R I n Transportation 3,180,177 3,120,993

Welcome Agenda | November 6, 2014 Look at a couple websites: Apple & FontFlipper New

OPERATING SYSTEMS Presentation of the teaching unit Who I am 2 Giorgio Giacinto 2019 Giorgio

1 2 New Mexico Nursing Education Consortium NMNEC www.nmnec.org nmnec@salud.unm.edu

St. Vrain Creek Breaches Restoration Project Goals and Scoping June 1, 2016 Clint Brown &

LUNA: Hard Real-Time, Multi-Threaded, CSP-Capable Execution Framework M. M. Bezemer R. J. W.

IROC Houston QA Center end-to-end QA phantom program key findings over the past 15 years David

Inspire Medical Systems, Inc. s May 2019 NYSE: INSP Disclaimer This presentation contains

Sambuz

Useful Links

Newsletter

Mail Us

Efficient and Scalable Operating System Provisioning with Kadeploy3 - PowerPoint PPT Presentation

Efficient and Scalable Operating System Provisioning with Kadeploy3 Luc Sarzyniec < luc.sarzyniec@inria.fr > Grid5000 Plan 1 Introduction Use cases Challenges Key features 2 Kadeploy internals 3 Example usages at large scale 4

Resource provisioning requirements are very diverse Existing resource provisioning solutions

The QoE Provisioning-Delivery- g y -- Hysteresis and its Importance for Service Provisioning

Falconieri: Remote Provisioning Service as a Service A new, modern, open source and cloud native

Black Hat Europe 2009 Hijacking Mobile Data Connections 1 Mobile Security Lab Provisioning

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

TERENA TERENA End-to-End (E2E) Provisioning Workshop End to End (E2E) Provisioning Workshop

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Perceus A Cluster Provisioning and Management Toolkit Bill Strossman Bill Strossman What

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Module 3: Operating-System Structures System Components Operating System Services

Module 3: Operating-System Structures System Components Operating-System Services

COAPS API A Generic Cloud Application Provisioning and Management API Why COAPS ? PaaS 1 Cloud

Provisioning Hardware for Cluster Applications 2 Friday, February 17, 12 Provisioning Hardware

End-to-End Provisioning Workshop - Establishing Lightpaths - Scope and Objectives NRENs Local

Governor's Textbook 313,802 112,276 201 ,526 R I n Transportation 3,180,177 3,120,993

Welcome Agenda | November 6, 2014 Look at a couple websites: Apple &amp; FontFlipper New

OPERATING SYSTEMS Presentation of the teaching unit Who I am 2 Giorgio Giacinto 2019 Giorgio

1 2 New Mexico Nursing Education Consortium NMNEC www.nmnec.org nmnec@salud.unm.edu

St. Vrain Creek Breaches Restoration Project Goals and Scoping June 1, 2016 Clint Brown &amp;

LUNA: Hard Real-Time, Multi-Threaded, CSP-Capable Execution Framework M. M. Bezemer R. J. W.

IROC Houston QA Center end-to-end QA phantom program key findings over the past 15 years David

Inspire Medical Systems, Inc. s May 2019 NYSE: INSP Disclaimer This presentation contains

Sambuz

Useful Links

Newsletter

Mail Us

Welcome Agenda | November 6, 2014 Look at a couple websites: Apple & FontFlipper New

St. Vrain Creek Breaches Restoration Project Goals and Scoping June 1, 2016 Clint Brown &