A Tool for Environment Deployment in Clusters and Light Grids presented by Guillaume Huard Yiannis Georgiou, Julien Leduc, Brice Videau, Johann Peyrard and Olivier Richard Laboratoire I nformatique et D istribution Grenoble, FRANCE Mescal Project
Outline Introduction Related Work and a new approach Kadeploy2 Environment deployment tool Performance Evaluations Work in Progress Conclusion and Perspectives 2 / 29
Outline Introduction Introduction and Motivations Environment deployment Related Work and a new approach Kadeploy2 Environment deployment tool Performance Evaluations Work in Progress Conclusion and Perspectives 3 / 29
Introduction and Motivations ◮ Cluster-Grid ...for High Performance Computing ◮ Exploitation and Administration ◮ Issues: ◮ Cluster: Software installation and configuration ◮ Grid: Software heterogeneity among clusters ◮ Experimental platforms, research ◮ Need for various software environment for the experiments. Cluster Computing Nodes Servers User gateway Central services (resource administrator, NFS , authentication,..) 4 / 29
Environment deployment Proposed solution: ◮ Environment deployment tool Applications Distro Specifiable Tools Middleware Environment OS(Linux, FreeBSD,...) Hardware Network Configurable ◮ Typical sequence of an environment deployment: 1)Submission of requested nodes 3)Work on the environment at the batch scheduler 1 2 3 4 2)Environment deployment 4)Work finishes, nodes return to Environment creation the initial reference environment New experiment 5 / 29
Outline Introduction Related Work and a new approach Related Work A new way of exploitation based on the deployment operation Kadeploy2 Environment deployment tool Performance Evaluations Work in Progress Conclusion and Perspectives 6 / 29
Related Work ◮ Cluster Management tools (Rocks, LCFG, Quattor, Oscar,...) ◮ Automated installation, configuration and management ◮ Imaging tools (Partimage, g4u, Frisbee,...) ◮ Creation of a disk/partition image ◮ Used: Mostly for maintenance but also for installation ◮ Automated installation tools (SIS, Kickstart, ...) ◮ OS and software Installation and Configuration ◮ Virtualization (Xen/XenoServer, VServer, ...) ◮ Different approach-Flexible infrastructure ◮ BUT: If we want to evaluate virtualization? 7 / 29
Related Work-Synthesis ◮ Cluster and Grid Exploitation -> Various existing software solutions ◮ Environment Deployment Operations -> only on phases of installation or maintenance ◮ Solutions not as flexible as desired 8 / 29
Kadeploy2: A new way of cluster and grid exploitation ◮ Enables every user to use the deployment operation on a cluster or grid and deploy the environment of his (her) preference. Environment Image 1 Node Node Environment Image 2 Server Node Node Node Protected Partitions Available Partitions ◮ Solution for the software environment homogeneity problem on cluster or light grid ◮ Ligtht grid: Simplification of grid...services/administration homogeneity ◮ Fast and robust deployment tool that proposes: ◮ Access control for every user to the deployment operations ◮ Simple method of environment creation 9 / 29
Outline Introduction Related Work and a new approach Kadeploy2 Environment deployment tool Kadeploy2:Architecture Kadeploy2:Deployment procedure Kadeploy2:Deployment procedure optimizations Performance Evaluations Work in Progress Conclusion and Perspectives 10 / 29
Kadeploy2:Architecture and Principles Batch Environment Scheduler Repository Users Kadeploy2 Database Submision Diffusion Network booting protocols Mechanism Hardware reboot mechanism Computing Nodes Client Server ◮ Utilization : usual protocols for network booting (PXE, TFTP , DHCP) ◮ Use of a database (MySql): state, config, environ. description ◮ Fast mechanism for environment diffusion, pipeline (flat tree, chain) ◮ Integration with the resource manager (ex:PBS, OAR,...) ◮ Deployment process: transfers new environment to every computing node (specific partition) ◮ Robustness (Hardware Side): Use of remote mechanisms for hardware reboot ◮ Environment creation: simple archiving of root partition in compressed tar format 11 / 29
The deployment procedure: Concepts ◮ Procedure controlled by a minimal-system (mini-kernel, initrd) ◮ Memory mounted ◮ Preinstallation ◮ Disk partitioning ◮ Transfer + Write ◮ Environment diffusion on the deployment partition ◮ Post-Installation ◮ Finalizes the configuration of services that lack autoconfiguration procedures ◮ Robustness (Software Side) ◮ failing nodes excluded by timeouts 12 / 29
Deployment procedure steps and time chart computing node 1) Submission 2) Attribution / Session opening Reference 1, 2, 3 Environment 3) Deployment Permission controls 4) Boot deployment minimal−system 5) Preinstallation 4 reboot 6) Environment propagation + Decompression on the partition 7) Postinstallation 14 reboot 8) Order of Reboot 5, 6, 7, 8 Minimal−system 9) Boot on the new environment 10) Work on the environment 9 reboot 11) Session end indication 12) Deployment permission rights withdrawal / End of session 13) Order of Reboot User Environment 10, 11, 12, 13 14) Boot on the reference environment Deployment Timetable Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 9 Row 10 Row 11 Row 12 Row 13 Row 14 ◮ Steps 4, 9, 14 (reboot phases) most time consuming ◮ Optimization motivation 13 / 29
1st Optimization method nomini computing node 1) Submission Reference 1, 2, 3 2) Attribution/Session opening Environment 3) Deployment Permission controls 4, 5, 6, 7, 8 4) Reference environment preparation 5) Preinstallation 6) Environment propagation+Decompression on the partition 7) Postinstallation 14 reboot reboot 9 8) Order of Reboot 9) Boot on the new environment 10) Work on the environment 11) Session end indication 12) Deployment permission rights withdrawal/End of session User Environment 13) Order of Reboot 10, 11, 12, 13 14) Boot on the reference environment ◮ 1st reboot elimination: Procedure controlled by the reference environment (no minimal-system) Constraints: ◮ Diffusion mechanism installed on the reference environment ◮ Deployment on a different partition than the current root Robustness-> guaranteed (same arguments as the default method) 14 / 29
2nd Optimization method pivot computing node 1) Submission 1, 2, 3 Reference 2) Attribution/Session opening 4, 5, 6, 7, 8 Environment 3) Deployment Permission controls 4) Reference environment preparation 5) Preinstallation 6) Environment propagation+Decompression on the partition 7) Postinstallation pivot 13 un−pivot 9 8) Reference environment preparation 9) Change the root file system(user environment)+services launching 10) Work on the environment 11) Session end indication 12) Deployment permission rights withdrawal/End of session User Environment 13) Change the root file system(reference environment) 10, 11, 12 + services launching ◮ Extension of the nomini method (1st reboot eliminated) ◮ 2nd reboot elimination, just change the root filesystem : use the system command pivot_root ◮ This is reversible ! -> 3d reboot elimination Drawbacks: ◮ Same constraints as the 1st optimization method (based on it) ◮ Cannot change the kernel or the kernel parameters. Robustness -> unchanged 15 / 29
Outline Introduction Related Work and a new approach Kadeploy2 Environment deployment tool Performance Evaluations Time to deploy at the cluster level Time to deploy at the grid level Work in Progress Conclusion and Perspectives 16 / 29
Execution time of a deployment ◮ Platform: Grid5000, the French nationwide experimental grid: ◮ 9 geographically distributed sites ◮ every site hosts 1 to 3 clusters(from 256CPUs to 1K CPUs) ◮ All sites connected by RENATER(French Academic Network) ->10Gbits(2006) ◮ Used 2 clusters for our performance measurements: ◮ GDX Cluster LRI Laboratory @ Orsay ( AMD Opteron biprocessor 2GHz, 2G RAM, Gigabit Ethernet ) ◮ Sophia Cluster INRIA Laboratory @ Sophia-Antipolis ( AMD Opteron biprocessor 2GHz, 2G RAM, Myrinet/Gigabit Ethernet ) 17 / 29
Default deployment method ◮ Metric: Time to reach each of the 5 most "time consuming" steps in the deployment procedure (from session opening to boot on the desired environment) ◮ GDX cluster (180 nodes) Kadeploy2 default deployment method on GDX cluster 500 reboot,first check preinstall environment propagation+copy postinstall 400 reboot,last check 300 time (sec) 200 100 0 0 50 100 150 200 #nodes ◮ bottom curve time to boot the minimal system ◮ upper curve total time to boot the desired environment 18 / 29
Comparison of deployment methods Kadeploy2 deployment procedure (Methods) 500 default nomini pivot 400 300 time (sec) 200 100 0 0 50 100 150 200 #nodes ◮ optimization methods 70-160sec faster 19 / 29
Deployment on a lightweight grid of 2 clusters ◮ Time diagram of a deployment (default method) on 2 Grid5000 sites using 260 nodes (180 nodes in site 1 (GDX) and 80 nodes in site 2 (Sophia)) 300 250 200 # nodes 150 100 50 0 0 200 400 600 800 1000 1200 time (sec) deploying deployed_site1 deployed deploying_site2 deploying_site1 deployed_site2 ◮ Current "boot-to-boot" time at Grid level 450 seconds 20 / 29
Recommend
More recommend