Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid’5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid’5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 1 / 24
Testing IaaS clouds stacks ◮ IaaS Cloud stacks: complex software ◮ Needs to be tested in realistic setups ◮ But testing often limited to: � Single-machine installations � Static deployments This talk: enabling large-scale testing of IaaS Cloud stacks on a shared, reconfigurable testbed S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 2 / 24
Outline Quick overview of the Grid’5000 testbed 1 Support for Virtualization and Cloud on Grid’5000 2 Deploying IaaS Clouds on Grid’5000 3 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 3 / 24
Grid’5000 Application ◮ Testbed for research on distributed systems � High Performance Computing Programming � Grids environment � Peer-to-peer systems Application runtime � Cloud computing Grid, Cloud or ◮ History: P2P middleware � 2003: Project started (ACI GRID) Operating system � 2005: Opened to users Networking ◮ Funding: Inria, CNRS and many local entities (regions, universities) ◮ Only for research on distributed systems → no production usage Litmus test: are you interested in the result of the computation? � Free nodes during daytime to prepare experiments � Large-scale experiments during nights and week-ends ◮ Also a scientific object: how does one design such a testbed? S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 4 / 24
Leading to results in several fields Cloud: Sky computing on FutureGrid and Grid’5000 ◮ Nimbus cloud deployed on 450+ nodes ◮ Grid’5000 and FutureGrid connected using ViNe HPC: factorization of RSA-768 ◮ Feasibility study: prove that it can be done ◮ Different hardware � understand the performance characteristics of the algorithms Grid: evaluation of the gLite grid middleware ◮ Fully automated deployment and configuration on 1000 nodes (9 sites, 17 clusters) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 5 / 24
Current status Lille ◮ 11 sites (1 outside France) Luxembourg Reims Orsay ◮ 26 clusters Nancy Rennes ◮ 1700 nodes ◮ 7400 cores ◮ Diverse technologies: Lyon � Intel (60%), AMD (40%) Bordeaux Grenoble � CPUs from one to 12 cores � Myrinet, Infiniband {S,D,Q}DR Toulouse Sophia � Two GPU clusters ◮ 500+ users per year S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 6 / 24
Backbone network Dedicated 10 Gbps backbone provided by RENATER (french NREN) Work in progress: ◮ packet-level and flow-level monitoring ◮ bandwidth reservation and limitation S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 7 / 24
Using Grid’5000: the user’s point of view Site access machine (access.nancy.grid5000.fr) [SSH] Site frontend Site access machine (nancy.grid5000.fr) (access.lyon.grid5000.fr) [OARSUB, KADEPLOY] [SSH] OARSUB OARSH Site clusters/nodes (e.g.: capricorne-12.lyon) Site frontend (frontend.lyon aka lyon) Site clusters/nodes [OARSUB, KADEPLOY] (e.g.: grelon-32.nancy) SSH Grid'5000 dedicated backbone Site clusters/nodes (e.g.: genepi-21.grenoble) Site access machine OARSUB SSH Site clusters/nodes (access.orsay.grid5000.fr) OARSH (e.g.: gdx-102.orsay) [SSH] SSH Site access machine User Site frontend (access.grenoble.grid5000.fr) [SSH] Site frontend (frontend.orsay aka orsay) [SSH] (frontend.grenoble aka grenoble) [OARSUB, KADEPLOY] [OARSUB, KADEPLOY] Site clusters/nodes (e.g.: azur-42.sophia) Site frontend (frontend.sophia aka sophia) [OARSUB, KADEPLOY] ◮ Key tool: SSH Site access machine (access.sophia.grid5000.fr) [SSH] ◮ Private network: connect through access machines ◮ Data storage: NFS (one server per Grid’5000 site) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 8 / 24
Resource management with OAR ◮ Batch scheduler with specific features � interactive jobs � advance reservations � powerful resource matching ◮ Resources hierarchy: cluster / switch / node / cpu / core ◮ Properties: memory size, disk type & size, hardware capabilities, network interfaces, . . . ◮ Other kind of resources: VLANs, IP ranges for virtualization I want 1 core on 2 nodes of the same cluster with 4096 GB of memory and Infiniband 10G + 1 cpu on 2 nodes of the same switch with dualcore processors for a walltime of 4 hours. . . oarsub -I -l "{memnode=4096 and ib10g=’YES’}/cluster=1/nodes=2/core=1 +{cpucore=2}/switch=1/nodes=2/cpu=1,walltime=4:0:0" S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 9 / 24
Resource management with OAR - visualization Resources status Gantt chart S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 10 / 24
Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24
Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) ◮ Selecting resources � OAR database filled from JSON oarsub -p "wattmeter=’YES’ and gpu=’YES’" S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24
Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) ◮ Selecting resources � OAR database filled from JSON oarsub -p "wattmeter=’YES’ and gpu=’YES’" ◮ Verifying resources � G5K-checks : validates resources against their description (detect hardware failures and misconfigurations at each boot) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24
Reconfiguring the testbed with Kadeploy ◮ Provides a Hardware-as-a-Service Cloud infrastructure ◮ Enable users to deploy their own software stack & get root access ◮ Standard environments provided to users � Customizations automated using Kameleon ◮ Scalable, efficient, reliable and flexible : � Chain-based and BitTorrent environment broadcast � 255 nodes deployed in 3 minutes ◮ Command-line interface & REST API for scripting http://kadeploy3.gforge.inria.fr/ S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 12 / 24
Customizing the experimental environment ◮ Reconfigure experimental conditions with Distem � Introduce heterogeneity in an homogeneous cluster � Emulate complex network topologies CPU cores 0 1 2 3 4 5 6 7 CPU performance n 1 n 4 i m s 0 f 0 ← f 0 i 1 0 M 2 → 1 b , M p s s s b p m b p , K 0 s 3 0 , 0 0 1 3 m 1 0 s s , m ← p s b K → ← 5 Mbps, 10ms ← 4 Mbps, 12ms 0 n 3 2 if0 if1 s ← m 10 Mbps, 5ms → 6 Mbps, 16ms → 3 2 0 , 0 s 5 p → 1 K b s 2 b M m K p 1 b s 0 p , 1 0 s , s 3 p , 0 ← b 4 m M 0 s m 0 s 0 0 i f i f 1 → 0 n 2 n 5 VN 1 VN 2 VN 3 Virtual node 4 http://distem.gforge.inria.fr/ S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 13 / 24
Virtualisation & Cloud XP requirements ◮ Efficient provisionning of machines � Kadeploy ◮ IP addresses for Virtual Machines ◮ Two different solutions on Grid’5000: � G5K-Subnets � KaVLAN S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 14 / 24
Network reservation with G5K-subnets ◮ Grid’5000 enable different users to run experiments concurrently � Need to mechanism to provide IP ranges for virtual machines ◮ G5K-subnets adds IP ranges reservation to OAR oarsub -l slash_22=2+nodes=8 -I ◮ IP ranges are routable inside Grid’5000 ◮ But no isolation: one can steal IP addresses S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 15 / 24
Network isolation with KaVLAN ◮ Reconfigures switches for the duration of a user experiment to achieve complete level 2 isolation : � Avoid network pollution (broadcast, unsolicited connections) � Enable users to start their own DHCP servers � Experiment on ethernet-based protocols � Interconnect nodes with another testbed without compromising the security of Grid’5000 ◮ Relies on 802.1q (VLANs) ◮ Compatible with many network equipments � Can use SNMP , SSH or telnet to connect to switches � Supports Cisco, HP , 3Com, Extreme Networks and Brocade ◮ Controlled with a command-line client or a REST API S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 16 / 24
Recommend
More recommend