Scalability Testing of Kadeploy using Virtual Machines on Grid’5000 Luc Sarzyniec, S´ ebastien Badia, Emmanuel Jeanvoine, Lucas Nussbaum Grid’5000 Scalability testing of Kadeploy on Grid’5000 1 / 10
Scalability Testing of Kadeploy using Virtual Machines on Grid’5000 Luc Sarzyniec, S´ ebastien Badia, Emmanuel Jeanvoine, Lucas Nussbaum Grid’5000 Scalability testing of Kadeploy on Grid’5000 1 / 10
Kadeploy – OS provisioning for clusters ◮ Used by sysadmins to install/reinstall compute nodes ◮ Designed for scalability ◮ That matters: faster reinstallation � shorter downtime ◮ Built on top of PXE, DHCP , TFTP (or HTTP) ◮ Support of a broad range of systems (Linux, Xen, *BSD, etc.) ◮ Manages catalog of images and user permissions ◮ Open Source (GPL) http://kadeploy3.gforge.inria.fr/ Scalability testing of Kadeploy on Grid’5000 2 / 10
Process overview Kadeploy DHCP TFTP/HTTP Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP 1 Kadeploy configures PXE profiles Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP (2) triggers reboot using IPMI or SSH 1 Kadeploy configures PXE profiles 2 Kadeploy triggers reboot using IPMI or SSH Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP 1 Kadeploy configures PXE profiles 2 Kadeploy triggers reboot using IPMI or SSH 3 Nodes boot to minimal deployment system sent over the network Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP 1 Kadeploy configures PXE profiles 2 Kadeploy triggers reboot using IPMI or SSH 3 Nodes boot to minimal deployment system sent over the network 4 Kadeploy configures nodes and sends system image Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP 1 Kadeploy configures PXE profiles 2 Kadeploy triggers reboot using IPMI or SSH 3 Nodes boot to minimal deployment system sent over the network 4 Kadeploy configures nodes and sends system image 5 Kadeploy configures PXE profiles again and triggers reboot Scalability testing of Kadeploy on Grid’5000 3 / 10
Process overview Kadeploy DHCP TFTP/HTTP 1 Kadeploy configures PXE profiles 2 Kadeploy triggers reboot using IPMI or SSH 3 Nodes boot to minimal deployment system sent over the network 4 Kadeploy configures nodes and sends system image 5 Kadeploy configures PXE profiles again and triggers reboot 6 Nodes boot to newly installed system Scalability testing of Kadeploy on Grid’5000 3 / 10
Scalable remote command execution with Taktuk Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk Sequential + sliding window (pdsh-like)? Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk In Kadeploy: Tree-based � logarithmic complexity (vs linear) ◮ using TakTuk – http://taktuk.gforge.inria.fr/ ◮ HPDC’2009 paper: B. Claudel, G. Huard and O. Richard. TakTuk, Adaptive Deployment of Remote Executions . Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk In Kadeploy: Tree-based � logarithmic complexity (vs linear) ◮ using TakTuk – http://taktuk.gforge.inria.fr/ ◮ HPDC’2009 paper: B. Claudel, G. Huard and O. Richard. TakTuk, Adaptive Deployment of Remote Executions . Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk In Kadeploy: Tree-based � logarithmic complexity (vs linear) ◮ using TakTuk – http://taktuk.gforge.inria.fr/ ◮ HPDC’2009 paper: B. Claudel, G. Huard and O. Richard. TakTuk, Adaptive Deployment of Remote Executions . Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk In Kadeploy: Tree-based � logarithmic complexity (vs linear) ◮ using TakTuk – http://taktuk.gforge.inria.fr/ ◮ HPDC’2009 paper: B. Claudel, G. Huard and O. Richard. TakTuk, Adaptive Deployment of Remote Executions . Scalability testing of Kadeploy on Grid’5000 4 / 10
Scalable remote command execution with Taktuk In Kadeploy: Tree-based � logarithmic complexity (vs linear) ◮ using TakTuk – http://taktuk.gforge.inria.fr/ ◮ HPDC’2009 paper: B. Claudel, G. Huard and O. Richard. TakTuk, Adaptive Deployment of Remote Executions . Scalability testing of Kadeploy on Grid’5000 4 / 10
Broadcast of system images images server Scalability testing of Kadeploy on Grid’5000 5 / 10
Broadcast of system images images server Send from server node to every client? Scalability testing of Kadeploy on Grid’5000 5 / 10
Broadcast of system images images server Send from server node to every client? Scalability testing of Kadeploy on Grid’5000 5 / 10
Broadcast of system images images server Use P2P? Scalability testing of Kadeploy on Grid’5000 5 / 10
Broadcast of system images images server Use P2P? Scalability testing of Kadeploy on Grid’5000 5 / 10
Broadcast of system images images server In Kadeploy: Topology-aware chained broadcast ◮ Limiting factor: backplane bandwidth of switches Scalability testing of Kadeploy on Grid’5000 5 / 10
Testing the scalability of Kadeploy ◮ Rather specific requirements ◮ Many reinstallable nodes (infrastructure + deployed nodes) ◮ DHCP server Scalability testing of Kadeploy on Grid’5000 6 / 10
Testing the scalability of Kadeploy ◮ Rather specific requirements ◮ Many reinstallable nodes (infrastructure + deployed nodes) ◮ DHCP server ◮ Testbed: Grid’5000 - http://www.grid5000.fr/ ◮ Testbed for research on distributed systems: HPC, Grids, P2P , Cloud ◮ 10 sites, 25 clusters, 1300 nodes, 7400 cores ◮ Unique features including: ◮ Hardware-as-a-Service Cloud : redeployment of OS on the bare metal by users (using Kadeploy) ◮ Dedicated backbone network ◮ KaVLAN: network isolation Scalability testing of Kadeploy on Grid’5000 6 / 10
Testing the scalability of Kadeploy ◮ Rather specific requirements ◮ Many reinstallable nodes (infrastructure + deployed nodes) ◮ DHCP server ◮ Testbed: Grid’5000 - http://www.grid5000.fr/ ◮ Testbed for research on distributed systems: HPC, Grids, P2P , Cloud ◮ 10 sites, 25 clusters, 1300 nodes, 7400 cores ◮ Unique features including: ◮ Hardware-as-a-Service Cloud : redeployment of OS on the bare metal by users (using Kadeploy) ◮ Dedicated backbone network ◮ KaVLAN: network isolation ◮ Still not enough nodes � virtual machines (KVM) on all nodes Scalability testing of Kadeploy on Grid’5000 6 / 10
Lille (P: 60, V: 428) Isolated Luxembourg L2 network 3-18 VM Reims per node Nancy Rennes (P: 336, V: 2160) (P: 102, V: 790) Totals: Lyon Physical: 635 Virtual: 3999 Grenoble Bordeaux Toulouse Sophia (P: 137, V: 661) 800 km Scalability testing of Kadeploy on Grid’5000 7 / 10
Recommend
More recommend