Open Cirrus™: A Global Testbed for Cloud Computing Research David O’Hallaron Director, Intel Labs Pittsburgh Carnegie Mellon University
Open Cirrus Testbed http://opencirrus.intel-research.net • Sponsored by HP , Intel, and Yahoo! (w/additional support from NSF). • 9 sites worldwide, target of around 20 in the next two years. • Each site 1000-4000 cores. • Shared hardware infrastructure (~15K cores), services, research, apps. Dave O’Hallaron – DIC Workshop, 2009 2
Open Cirrus Context Goals 1. Foster new systems and services research around cloud computing 2. Catalyze open-source stack and APIs for the cloud Motivation — Enable more tier-2 and tier-3 public and private cloud providers How are we different? — Support for systems research and applications research • Access to bare metal, integrated virtual-physical migration — Federation of heterogeneous datacenters • Global signon, monitoring, storage services. Dave O’Hallaron – DIC Workshop, 2009 3
Intel BigData Cluster Open Cirrus site hosted by Intel Labs Pittsburgh — Operational since Jan 2009. — 180 nodes, 1440 cores, 1416 GB DRAM, 500 TB disk Supporting 50 users, 20 projects from CMU, Pitt, Intel, GaTech — Cluster management, location and power aware scheduling, physical virtual migration (Tashi), cache savvy algorithms (Hi- Spade), realtime streaming frameworks (SLIPstream), optical datacenter interconnects (CloudConnect), log-based architectures (LBA) — Machine translation, speech recognition, programmable matter simulation , ground model generation, online education, realtime brain activity decoding, realtime gesture and object recognition, federated perception, automated food recognition. Idea for a research project on Open Cirrus? — Send short email abstract to Mike Kozuch, Intel Labs Pittsburgh, michael.a.kozuch@intel.com Dave O’Hallaron – DIC Workshop, 2009 4
Open Cirrus Stack Compute + network + storage resources Management and control subsystem Power + cooling Physical Resource set (PRS) service Credit: John Wilkes (HP) Dave O’Hallaron – DIC Workshop, 2009 5
Open Cirrus Stack PRS clients, each with their own ―physical data center‖ Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 6
Open Cirrus Stack Virtual clusters (e.g., Tashi) Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 7
Open Cirrus Stack 1. Application running 2. On Hadoop BigData App 3. On Tashi virtual cluster Hadoop 4. On a PRS 5. On real hardware Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 8
Open Cirrus Stack Experiment/ BigData app save/restore Hadoop Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 9
Open Cirrus Stack Experiment/ BigData App save/restore Hadoop Platform services Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 10
Open Cirrus Stack User services Experiment/ BigData App save/restore Hadoop Platform services Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS service Dave O’Hallaron – DIC Workshop, 2009 11
Open Cirrus Stack User services Experiment/ BigData App save/restore Hadoop Platform services Virtual cluster Virtual cluster Research Tashi NFS storage HDFS storage service service PRS Dave O’Hallaron – DIC Workshop, 2009 12
System Organization Compute nodes are divided into dynamically-allocated, vlan- Open isolated PRS subdomains service research Apps running in a VM mgmt Tashi Apps switch back and forth between virtual and phyiscal. infrastructure development (e.g., Tashi) Production storage service Proprietary Open workload service monitoring and trace research collection Dave O’Hallaron – DIC Workshop, 2009 13
Open Cirrus Stack - PRS PRS service goals — Provide mini-datacenters to researchers — Isolate experiments from each other — Stable base for other research PRS service approach — Allocate sets of physical co-located nodes, isolated inside VLANs. PRS code from HP Labs being merged into Apache Tashi project. Credit: Kevin Lai (HP), Richard Gass, Michael Ryan, Michael Kozuch, and David O’Hallaron (Intel) Dave O’Hallaron – DIC Workshop, 2009 14
Open Cirrus Stack - Tashi An open source Apache Software Foundation project sponsored by Intel, CMU, and HP. Research infrastructure for cloud computing on Big Data — Implements AWS interface — Daily production use on Intel cluster for 6 months • Manages pool of 80 physical nodes • ~20 projects/40 users from CMU, Pitt, Intel — http://incubator.apache.org/projects/tashi Research focus: — Location-aware co-scheduling of VMs, storage, and power. — Integrated physical/virtual migration (using PRS) Credit: Mike Kozuch, Michael Ryan, Richard Gass, Dave O’Hallaron (Intel), Greg Ganger, Mor Harchol-Balter, Julio Lopez, Jim Cipar, Elie Kravat, Anshul Ghandi, Michael Stroucken (CMU) Dave O’Hallaron – DIC Workshop, 2009 15
Tashi High-Level Design Services are instantiated through virtual machines Most decisions happen in the scheduler; manages compute/storage/power in concert Data location and power information Virtualization Service is exposed Scheduler to scheduler and services Storage Service The storage service aggregates the capacity of the commodity nodes to house Big Data repositories. Cluster Manager Node Node Node Node Node Node Cluster nodes are assumed CM maintains databases to be commodity machines and routes messages; decision logic is limited Dave O’Hallaron – DIC Workshop, 2009 16
Location Matters (calculated) Calculated (40 racks * 30 nodes * 2 disks) 300 Throughput/disk (MB/s) 250 9.2X 11X 200 150 100 3.6X 3.5X 50 0 Disk-1G SSD-1G Disk-10G SSD-10G Random Placement Location-Aware Placement Dave O’Hallaron – DIC Workshop, 2009 17
Location Matters (measured) Measured (2 racks * 14 nodes * 6 disks) 40 Throughput/disk (MB/s) 35 4.7X 30 25 2.9X 20 15 10 5 0 ssh xinetd Random Placement Location-aware Placement Dave O’Hallaron – DIC Workshop, 2009 18
Open Cirrus Stack – Hadoop An open-source Apache Software Foundation project sponsored by Yahoo! — http://wiki.apache.org/hadoop/ProjectDescription Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS) 19 Dave O’Hallaron – DIC Workshop, 2009 19
Typical Web Service Data center db Application Query Application External server HTTP Application server Application client server server Result server db Characteristics: Examples: • Small queries and results Web sites serving • Little client computation dynamic content • Moderate server computation • Moderate data accessed per query Dave O’Hallaron – DIC Workshop, 2009 20
Big Data Service Data-intensive computing system (e.g. Hadoop) Query External External Parallel Parallel Parallel data client compute server query server Result data server sources Parallel file system d 1 d 2 d 3 (e.g., GFS, HDFS) Source Derived dataset datasets Characteristics: Examples: • Small queries and results • Search • Massive data and computation • Photo scene completion performed on server • Log processing • Science analytics Dave O’Hallaron – DIC Workshop, 2009 21
Streaming Data Service Continuous query stream External External Parallel Parallel Parallel client and data compute server query server data server sensors sources Continuous query results d 1 d 2 d 3 Source Derived dataset datasets Characteristics: Examples: • Application lives on client Perceptual computing • Client uses cloud as an accelerator on high data-rate • Data transferred with query sensors: real time brain • Variable, latency sensitive HPC on server activity detection, object • Often combines with Big Data service recognition, gesture recognition Dave O’Hallaron – DIC Workshop, 2009 22
Streaming Data Service Gestris – Interactive Gesture Recognition Two- player ―Gestris‖ (gesture -Tetris) implementation • 2 video sources • Uses a simplified volumetric event detection algorithm • 10 cores, 3GHz each: -1 camera input, scaling -1 game + display -8 for volumetric matching (4 for each video stream) • Achieves full 15fps rate Arm gesture selects action Credit: Lily Mummert, Babu Pillai, Rahul Sukthankar (Intel), Martial Hebert, Pyre Matikainen (CMU) Dave O’Hallaron – DIC Workshop, 2009 23
Streaming Data Meets Big Data Real-time Brain Activity Decoding • Magnetoencephalography (MEG) measures the magnetic fields associated with brain activity. • Temporal and spatial resolution offers unprecedented insights into brain dynamics. MEG ECoG Credit: Dean Pomerleau (Intel), Tom Mitchell, Gus Sudre and Mark Palatucci (CMU), Wei Wang, Doug Weber and Anto Bagic (UPitt) Dave O’Hallaron – DIC Workshop, 2009 24
Recommend
More recommend