toward a national research platform
play

Toward a National Research Platform Invited Presentation Open - PowerPoint PPT Presentation

Toward a National Research Platform Invited Presentation Open Science Grid All Hands Meeting Salt Lake City, UT March 20, 2018 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E.


  1. “Toward a National Research Platform” Invited Presentation Open Science Grid All Hands Meeting Salt Lake City, UT March 20, 2018 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1 http://lsmarr.calit2.net

  2. 30 Years Ago NSF Brought to University Researchers a DOE HPC Center Model 1985/6 SDSC Was Modeled on MFEnet NCSA Was Modeled on LLNL

  3. I-WAY: Information Wide Area Year Supercomputing ‘95 • The First National 155 Mbps Research Network – 65 Science Projects – Into the San Diego Convention Center • I-Way Featured: – Networked Visualization Applications – Large-Scale Immersive Displays – I-Soft Programming Environment – Led to the Globus Project See talk by: Brian Bockelman http://archive.ncsa.uiuc.edu/General/Training/SC95/GII.HPCC.html UIC

  4. NSF’s PACI Program was Built on the vBNS to Prototype America’s 21st Century Information Infrastructure 1997 The PACI Grid Testbed vBNS Key Role led to of Miron Livny & Condor National Computational Science

  5. UCSD Has Been Working Toward PRP for Over 15 Years: NSF OptIPuter, Quartzite, Prism Awards Precursors to DOE Defining DMZ in 2010 PI Smarr, PI Papadopoulos, PI Papadopoulos, 2002-2009 2004-2007 2013-2015

  6. Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Funded Over 100 Campuses to Build DMZs NSF Program Officer: Kevin Thompson Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees Source: NSF

  7. Logical Next Step: The Pacific Research Platform Networks Campus DMZs to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System NSF CC*DNI Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2/QI, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Letters of Commitment from: • 50 Researchers from 15 Campuses (GDC) • 32 IT/Network Organization Leaders NSF Program Officer: Amy Walton Source: John Hess, CENIC

  8. Note That the OSG Cluster Map Has Major Overlap with the NSF-Funded DMZ Map NSF CC* Grants Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  9. Bringing OSG Software and Services to a Regional-Scale DMZ Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  10. Key PRP Innovation: UCSD Designed FIONAs To Solve the Disk-to-Disk Big Data Science Data Transfer Nodes (DTNs)- Data Transfer Problem at Full Speed on 10/40/100G Networks Flash I/O Network Appliances (FIONAs) • FIONAs PCs [a.k.a ESnet DTNs]: FIONette—1G, $250 – ~$8,000 Big Data PC with: – 1 CPUs – 10/40 Gbps Network Interface Cards – 3 TB SSDs or 100+ TB Disk Drive – Extensible for Higher Performance to: – +NVMe SSDs for 100Gbps Disk-to-Disk – +Up to 8 GPUs [4M GPU Core Hours/Week] – +Up to 160 TB Disks for Data Posting – +Up to 38 Intel CPUs FIONAS—10/40G, $8,000 – $700 10Gpbs FIONAs Being Tested • FIONettes are $270 FIONAs – 1Gbps NIC With USB-3 for Flash Storage or SSD Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2

  11. We Measure Disk-to-Disk Throughput with 10GB File Transfer Using Globus GridFTP 4 Times Per Day in Both Directions for All PRP Sites January 29, 2016 July 21, 2017 From Start of Monitoring 12 DTNs to 24 DTNs Connected at 10-40G in 1 ½ Years Source: John Graham, Calit2/QI

  12. PRP’s First 2 Years: Connecting Multi-Campus Application Teams and Devices Earth Sciences

  13. PRP Over CENIC Couples UC Santa Cruz Astrophysics Cluster to LBNL NERSC Supercomputer CENIC 2018 Innovations in Networking Award for Research Applications

  14. 100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster from the LBNL NERSC Supercomputer for DESI Science Analysis Precursors to LSST and NCSA 300 images per night. 250 images per night. 100MB per raw image 530MB per raw image NSF-Funded Cyberengineer 120GB per night 800GB per night Shaw Dong @UCSC Receiving FIONA Source: Peter Nugent, LBNL Professor of Astronomy, UC Berkeley Feb 7, 2017

  15. Jupyter Has Become the Digital Fabric for Data Sciences PRP Creates UC-JupyterHub Backbone Source: John Graham, Calit2 Goal: Jupyter Everywhere

  16. LHCOne Traffic Growth is Large Now But Will Explode in 2026 31 Petabytes LHC Accounts for 47% of in January 2018 Total ESNet traffic Today +38% Change Within Last Year Dramatic Data Volume Growth Expected for HL-LHC in 2026 Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  17. Data Transfer Rates From 40 Gbps DTN in UCSD Physics Building, Across Campus on PRISM DMZ, Then to Chicago’s Fermilab Over CENIC/ESnet Based on This Success, Würthwein Will Upgrade 40G DTN to 100G For Bandwidth Tests & Kubernetes Integration With OSG, Caltech, and UCSC Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  18. LHC Data Analysis Running on PRP Two Projects: • OSG Cluster-in-a-Box for “T3” • Distributed Xrootd Cache for “T2” Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  19. First Steps Toward Integrating OSG and PRP – Tier 3 “Cluster-in-a Box” Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  20. PRP Distributed Tier-2 Cache Across Caltech & UCSD Applications Can Connect at Local Global Data Federation of CMS or Top Level Cache Redirector ⇒ Test the System as Individual or Joint Cache Redirector Top Level Cache UCSD Caltech Provisioned pilot systems: PRP UCSD: 9 x 12 SATA Disk of 2TB Redirect Redirect @ 10Gbps for Each System or or … … PRP Caltech: 2 x 30 SATA Disk of 6TB Cache Cache Cache Cache @ 40Gbps for Each System Server Server Server Server Production Use (UCSD only) I/O in Production Limited by # of Apps Hitting the Cache, and Their I/O Patterns Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

  21. Game Changer: Using Kubernetes See talk by: Rob Gardner to Manage Containers Across the PRP “Kubernetes is a way of stitching together a collection of machines into, basically, a big computer,” --Craig Mcluckie, Google and now CEO and Founder of Heptio “Kubernetes has emerged as "Everything at Google runs in a container." the container orchestration engine of choice --Joe Beda,Google for many cloud providers including Google, AWS, Rackspace, and Microsoft, and is now being used in HPC and Science DMZs. --John Graham, Calit2/QI UC San Diego

  22. Distributed Computation on PRP Nautilus HyperCluster Coupling SDSU Cluster and SDSC Comet Using Kubernetes Containers [CO 2,aq ] 100 Year Simulation Simulating the Injection of CO 2 in Brine-Saturated Reservoirs: Poroelastic & Pressure-Velocity 4 days Fields Solved In Parallel With MPI 75 years Using Domain Decomposition • 0.5 km x 0.5 km x 17.5 m • Three sandstone layers Across Containers separated by two shale layers 25 years 100 years Developed and executed MPI-based PRP Kubernetes Cluster execution Source: Chris Paolini and Jose Castillo, SDSU

  23. Rook is Ceph Cloud-Native Object Storage ‘Inside’ Kubernetes https://rook.io/ See talk by: Shawn McKee Source: John Graham, Calit2/QI

  24. FIONA8: Adding GPUs to FIONAs Supports Data Science Machine Learning Multi-Tenant Containerized GPU JupyterHub Running Kubernetes / CoreOS Eight Nvidia GTX-1080 Ti GPUs ~$13K 32GB RAM, 3TB SSD, 40G & Dual 10G ports Source: John Graham, Calit2

  25. Nautilus - A Multi-Tenant Containerized PRP HyperCluster for Big Data Applications Running Kubernetes with Rook/Ceph Cloud Native Storage and GPUs for Machine Learning UCLA USC Caltech 40G SSD UCR 40G SSD 100G NVMe 6.4T 40G SSD UCSB UCI 40G SSD FIONA8 Kubernetes FIONA8 Centos7 FIONA8 UCSC 40G SSD SDSC 100G NVMe 6.4T 100G Gold FIONA8 Rook/Ceph - Block/Object/FS FIONA8 100G Gold NVMe FIONA8 Swift API compatible with 100G Epyc NVMe SDSC, AWS, and Rackspace Stanford Calit2 40G SSD sdx-controller controller-0 UCAR FIONA8 SDSU Hawaii FIONA8 40G SSD FIONA8 FIONA8 100G NVMe 6.4T 40G SSD FIONA8 FIONA8 March 2018 John 40G SSD 3T Graham, Calit2/QI

  26. Running Kubernetes/Rook/Ceph On PRP Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data UCLA USC Caltech 40G 160TB UCR 40G 160TB 100G NVMe 6.4T 40G 160TB UCSB UCI 40G 160TB FIONA8 Kubernetes FIONA8 Centos7 FIONA8 UCSC 40G 160TB SDSC 100G NVMe 6.4T 100G Gold FIONA8 Rook/Ceph - Block/Object/FS FIONA8 100G Gold NVMe FIONA8 Swift API compatible with 100G Epyc NVMe SDSC, AWS, and Rackspace Stanford Calit2 40G 160TB sdx-controller controller-0 UCAR FIONA8 SDSU Hawaii FIONA8 40G 160TB FIONA8 FIONA8 100G NVMe 6.4T 40G 160TB FIONA8 FIONA8 40G 160TB March 2018 John Graham, UCSD

  27. Collaboration Opportunity with OSG & PRP on Distributed Storage 1.8PB 1.2PB 1.6PB See talks by: Alex Feltus Derek Weitzel 210TB Total data volume pulled last year StashCache Users include: is dominated by 4 caches. LIGO OSG Is Operating a Distributed Caching CI. At Present, 4 Caches Provide Significant Use PRP Kubernetes Infrastructure Could Either See talk by Grow Existing Caches by Adding Servers, Marcelle Soares-Santos or by Adding Additional Locations DES Source: Frank Würthwein, OSG, UCSD/SDSC, PRP

Recommend


More recommend