Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University - PowerPoint PPT Presentation

A New Community Resource for Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University Gary Grider, Los Alamos National Laboratory Katharine Chartrand, New Mexico Consortium Andree Jacobson, New Mexico Consortium

LANL is “ giving us” Lightning www.pdl.cmu.edu 2 Garth Gibson, Nov 2010 �

NSF Funds NMC to Recycle • NSF funds PRObE (2011-2014) • Parallel Reconfigurable Observational Environment • Large scale clusters for systems researchers • For dedicated use, long periods of time (days, weeks) • Allow replacement of any and all software www.pdl.cmu.edu 3 Garth Gibson, Nov 2010 �

Hardware Plan • Fall 2011: Sitka (2048 cores) -- allocated • 1024 Nodes, Dual Socket, Single Core AMD Opteron; 2 GB per core; Myrinet • Fall 2012: Kodiak (2048 cores) -- identified • 1024 Nodes, Dual Socket, Single Core AMD Opteron; 4 GB per core; SDR Infiniband • Fall 2013: Nome (1600 cores) • 200 Node, Quad Socket, Dual Core AMD Opteron; 2 GB per core; DDR Infiniband • Plus • Ethernet & Fat-tree high-speed interconnect www.pdl.cmu.edu 4 Garth Gibson, Nov 2010 �

Hardware Plan II • Small (128 nodes) staging clusters, and • Smaller (buy new) higher-core-count clusters • Summer 2011: Susitna (1728 cores) -- tbd – 36 Nodes, Quad Socket, 12 core AMD (?); 1-2GB RAM per core; EDR Infiniband high- speed interconnect • Summer 2013: Matanuska (3456 cores) – 36 Nodes, Quad Socket, 24 core AMD (?); 1-2GB RAM per core; 100 GigaBit Ethernet (or similar) www.pdl.cmu.edu 5 Garth Gibson, Nov 2010 �

www.pdl.cmu.edu 6 Garth Gibson, Nov 2010 �

For Systems Research Users • NSF “ who can apply ” rules • Includes international and corporate research projects ( “ best ” in partnership with US university) www.pdl.cmu.edu 7 Garth Gibson, Nov 2010 �

Software • First, “ none ” is allowed • Researchers can put any software they want onto the clusters • Second, a well known tool managing clusters of hardware for research • Emulab (www.emulab.org), Flux Group, U. Utah • On staging clusters, also on large clusters • Enhanced for PRObE hardware, scale, networks, resource partitioning policies, remote power and console, failure injection, deep instrumentation • PRObE provides hardware support (spares) www.pdl.cmu.edu 8 Garth Gibson, Nov 2010 �

Allocation • Competitive (target a few pages per proposal) • Justified for research needing PRObE resources • Not for cycles – for systems research • Results must be published & credit given • Low threshold to get onto staging clusters • Emulab procedures wherever appropriate • Allocation by community importance/merit • Committee recommends order & duration of use • Allocation opportunity tokens used to incent usage – Prompt return of resources, other contributions – Unused time offered to pending projects www.pdl.cmu.edu 9 Garth Gibson, Nov 2010 �

PRObE Decision Making • Committees usually about 6, selected by standard academic procedures (via BOFs) www.pdl.cmu.edu 10 Garth Gibson, Nov 2010 �

Next Steps • Identify interested researchers & research • Seek candidates to steer (advisory committee) • Seek candidates to select program (project selection committee) • Seek candidates to shape experience (user environment advisory committee) • Seek advice on anything else • probe@newmexicoconsortium.org • http://newmexicoconsortium.org/probe www.pdl.cmu.edu 11 Garth Gibson, Nov 2010 �

Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University - PowerPoint PPT Presentation

A New Community Resource for Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University Gary Grider, Los Alamos National Laboratory Katharine Chartrand, New Mexico Consortium Andree Jacobson, New Mexico Consortium LANL is giving

Exa & Yotta Scale Data SC 08 Panel November 21 2008, Austin, TX Garth Gibson Carnegie

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA

Envisioning a Parallel File System without Dedicated Metadata Servers Qing Zheng Kai Ren, Garth

Key features 1. Require no dedicated resources 2. Almost no post-processing is needed 3. Low I/O

File System Architecture APP APP APP APP Metad adat ata S a Service metadata operations

Parallel Thinking * Guy Blelloch Carnegie Mellon University * PROBE as part of the Center for

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a

From Carnegie Mellon to Kyoto: How Far Can We Go? Project Courses at Carnegie Mellon Involve

SIFT 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University SIFT (Scale Invariant

HPC storage benchmarking Mike Mesnier (Intel/CMU) James Hendricks, Raja R. Sambasivan, Brock

Recursive Regularization for Large-scale Classification with Hierarchical and Graphical

for BlueGene/P Franz Franchetti 1 , Yevgen Voronenko 2 , Gheorghe Almasi 3 1 Carnegie Mellon

Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon

Carnegie Mellon University Search TRECVID 2004 Workshop November 2004 Mike Christel, Jun

Petascale Data Storage Workshop, PDSW08 Rewarding the Public Release of Valuable Data and

A First Look Franz Franchetti, Daniele G. Spampinato, Anuva Kulkarni, Tze Meng Low Carnegie

Distributed Training for Large-scale Logistic Models Siddharth Gopal Carnegie Mellon Univeristy

Image Pyramids 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What are image

Machine Translation for Human Translators Carnegie Mellon Ph.D. Thesis Michael Denkowski

Running Incomplete Programs Ian Voysey Carnegie Mellon University Cyrus Omar Carnegie Mellon

15-213 Recitation: Attack Lab Jenna MacCarley 28 Sep 2015 Carnegie Mellon Reminder Bomb lab

15-213 Recitation: Bomb Lab 21 Sep 2015 Monil Shah, Shelton DSouza Carnegie Mellon Agenda

Third Annual Carnegie Mellon Conference on the Electricity Industry Enhancing IGCC economics with

Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University - PowerPoint PPT Presentation

A New Community Resource for Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University Gary Grider, Los Alamos National Laboratory Katharine Chartrand, New Mexico Consortium Andree Jacobson, New Mexico Consortium LANL is giving

Exa &amp; Yotta Scale Data SC 08 Panel November 21 2008, Austin, TX Garth Gibson Carnegie

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA

Envisioning a Parallel File System without Dedicated Metadata Servers Qing Zheng Kai Ren, Garth

Key features 1. Require no dedicated resources 2. Almost no post-processing is needed 3. Low I/O

File System Architecture APP APP APP APP Metad adat ata S a Service metadata operations

Parallel Thinking * Guy Blelloch Carnegie Mellon University * PROBE as part of the Center for

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a

From Carnegie Mellon to Kyoto: How Far Can We Go? Project Courses at Carnegie Mellon Involve

SIFT 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University SIFT (Scale Invariant

HPC storage benchmarking Mike Mesnier (Intel/CMU) James Hendricks, Raja R. Sambasivan, Brock

Recursive Regularization for Large-scale Classification with Hierarchical and Graphical

for BlueGene/P Franz Franchetti 1 , Yevgen Voronenko 2 , Gheorghe Almasi 3 1 Carnegie Mellon

Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon

Carnegie Mellon University Search TRECVID 2004 Workshop November 2004 Mike Christel, Jun

Petascale Data Storage Workshop, PDSW08 Rewarding the Public Release of Valuable Data and

A First Look Franz Franchetti, Daniele G. Spampinato, Anuva Kulkarni, Tze Meng Low Carnegie

Distributed Training for Large-scale Logistic Models Siddharth Gopal Carnegie Mellon Univeristy

Image Pyramids 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What are image

Machine Translation for Human Translators Carnegie Mellon Ph.D. Thesis Michael Denkowski

Running Incomplete Programs Ian Voysey Carnegie Mellon University Cyrus Omar Carnegie Mellon

15-213 Recitation: Attack Lab Jenna MacCarley 28 Sep 2015 Carnegie Mellon Reminder Bomb lab

15-213 Recitation: Bomb Lab 21 Sep 2015 Monil Shah, Shelton DSouza Carnegie Mellon Agenda

Third Annual Carnegie Mellon Conference on the Electricity Industry Enhancing IGCC economics with

Exa & Yotta Scale Data SC 08 Panel November 21 2008, Austin, TX Garth Gibson Carnegie