www. chameleoncloud.org CHAMELEON: A LARGE SCALE, RECONFIGURABLE EXPERIMENTAL INSTRUMENT FOR COMPUTER SCIENCE Kate Keahey Computation Institute, University of Chicago Argonne National Laboratory keahey@anl.gov 1 APRIL 23, 2018
INTRODUCING CHAMELEON Deeply Reconfigurable Instrument for Computer Science Research Support for isolation, bare metal reconfiguration, custom kernel reboot, console access, etc. Large-scale Experimental Infrastructure Total of ~650 nodes (~14,500 cores), 5 PB of storage distributed over 2 sites connected with 100G network Large-scale homogenous partition Heterogeneous hardware: Infiniband, FPGAs, GPUs, ARMs, Atoms, etc. Support for large-scale in capabilities and policies Developed primarily on top of commodity open source system Leverages community investment in the project Contributes to development: revival of Blazar, contributions to Ironic, Nova, implementation of snapshotting, dynamic VLANs, etc. Interacts with the community via the scientific working group www.chameleoncloud.org www. chameleoncloud.org
CHAMELEON HARDWARE To UTSA, GENI, Future Partners Switch Core Services Standard Front End and Data Cloud Unit 504 x86 Compute Servers Mover Nodes 42 compute 48 Dist. Storage Servers 4 storage 102 Heterogeneous Servers x2 16 Mgt and Storage Nodes Chicago Chameleon Core Network Austin 100Gbps uplink public network SCUs connect to (each site) core and fully connected to each other Heterogeneous Switch Cloud Units Core Services Standard Alternate Processors and Networks Cloud Unit 3.6 PB Central File Systems, Front End 42 compute and Data Movers 4 storage x10 www. chameleoncloud.org
EXPERIMENTAL WORKFLOW REQUIREMENTS discover provision configure and monitor resources resources interact - Fine-grained - Advance - Deeply - Hardware - Complete reservations & reconfigurable metrics - Up-to-date on-demand - Appliance - Fine-grained -Versioned - Isolation catalog information -Verifiable - Fine-grained - Snapshotting - Aggregate and allocations - Complex archive Appliances - Network Isolation www. chameleoncloud.org
CHAMELEON IMPLEMENTATION Web portal Horizon Chameleon Instance TAS (TACC) Keystone Utilities Agents Allocation Appliance Grid’5000 Blazar Ironic Ceilometer Clients Management Catalog Resource Discovery Request Neutro Nova Tracker n Swift Glance Heat Resource User Appliance Discovery Configuration Monitoring Management Services Catalog Services Services Services Services www. chameleoncloud.org
CHALLENGES AND LESSONS LEARNED Building on top of a commodity open source project Significant advantages in terms of direct and indirect community investment Advantages for long-term maintenance We need more than a testbed to support CS research Traces and workloads, research data Tools for repeatability New concept: myths and misperceptions Not true: “only available to users with NSF allocation” Not true: “they use OpenStack so it is VMs” Not true: “can’t get an experiment with 100s of nodes” Managing incentives Balancing individual versus community needs: allocations and lease limits Resource scarcity www. chameleoncloud.org
DISCUSSION Focus on specific users/scenarios/benefits Specific benefits drive involvement Balancing incentives to federate Science has no borders But: resources are finite, and stakeholder interests need to be respected Building on common needs Common data sharing services? Traces and workloads www. chameleoncloud.org
www. chameleoncloud.org www.chameleoncloud.org keahey@anl.gov 8 APRIL 23, 2018
Recommend
More recommend