StashCache Derek Weitzel Open Science Grid (with slides from Brian - PowerPoint PPT Presentation

StashCache Derek Weitzel Open Science Grid (with slides from Brian Bockelman) 1

2015 OSG All Hands Meeting Northwestern University 2

Motivation Opportunistic Computing is like giving away empty airline seats; the plane was going to fly regardless. Opportunistic Storage is like giving away real estate . 3

Motivation • Using the SE paradigm has been a colossal failure for opportunistic VOs. • Systems for CMS and ATLAS are robust and e ffi cient, but proven impossible for others. Cost of management is too high and opportunistic VOs are unable to command site admin time. • Key to this failure is the underlying assumption in the SE paradigm that file loss is an exceptional event. • Again, “Storage is like real estate.” • To be successful, opportunistic storage must treat file loss as a everyday, expected occurrence. • The lack of high-speed local storage significantly decreases the range of workflows opportunistic VOs can run on the OSG. 4

Caching • A file is downloaded locally to the cache from an origin server on first access. • On future accesses, the local copy is used. • When more room needs to be made for access, “old” files are removed (by some algorithm which decides the definition of “old”). • Downsides: • Caching is only useful is the working set size is less than the cache size. • Otherwise, the system performance is limited to the bandwidth of the system feeding the cache. • Working set size is di ffi cult to estimate for multi-VO. • Not all workflows are supported. This does not work well if files need to be modified. 5

StashCache Syracuse UChicago BNL UNL Illinois UCSD 6

Growth of StashCache • Syracuse did not start as an initial site for StashCache • They wanted StashCache for 2 reasons: • Decrease the network load on the WAN from OSG jobs • Cache locally the LIGO data set (discussed later) 7

Growth • Now, Syracuse StashCache is contributing to OSG StashCache federation • For example, over the last 24 hours, transferring data out of the cache on average 7.7Gbps 8

StashCache OSG-Connect IF Source Source 1. User places files on the   OSG-Connect “origin” server OSG Redirector 2. Jobs request the file from the nearby Caching Proxy Caching Caching Caching Caching Proxy Proxy Proxy Proxy 3. Caching Proxy query the federation for location of the Job Job Download file Job Redirect Discovery 9

How is it used? • CVMFS - Most common • StashCP - Custom developed tool • Uses CVMFS when possible, falls back to XRootD tools 10

CVMFS • Fuse based filesystem /cvmfs/stash.osgstorage.org/user/dweitzel/public/blast/data/yeast.aa Use CVMFS service Domain for CVMFS (not necessarily a web address) Cached Filesystem Namespace Data transferred through   StashCache 11

CVMFS • Filesystem Namespace is cached on the site’s HTTP proxy infrastructure • Read-Only filesystem • User’s can run regular commands on the directories (ls, cp, …) 12

StashCache + CVMFS • A service periodically scans the origin server, publishes the filesystem to CVMFS • Looks for changes • Checksum the changed files • Actual CVMFS namespace only stores the checksum and meta information • DOI: 10.1088/1742-6596/898/6/062044 13

StashCP • Custom tool developed by StashCache team • Uses GeoIP to determine the ‘nearest’ cache • Uses CVMFS if available, otherwise uses XRootD tool to copy from cache 14

What to Use? • CVMFS: • Takes up to 8 hours for files to appear • POSIX like interface, can even open() the file • StashCP • Files are instantly available to jobs. • Batch copy mode only 15

Monitoring / Accounting 16

Per-File Monitoring (beta) Minerva (FNAL) 17

Science Enabled • Minerva - Public Data • LIGO - Private data • Bioinformatics - Public Data 18

Minerva adopts StashCache • Minerva was seeing very poor e ffi ciency in jobs; lots of waiting to copy "flux" files (inputs to neutrino MC) • Jobs could not proceed until copying finished • Suggested switching to StashCache over CVMFS to alleviate load of simultaneous copies • Make symlinks to files in /cvmfs/minerva.osgstorage.org/ in same place as previous copies were going ( no change to code downstream ) • Worked very well at first, but large volume of jobs eventually seemed to slow down. Pulling too much from HCC? • Supposed to be set up for on-site jobs to read directly from source (FNAL dCache) rather than going to the Neb. redirector. Currently verifying that was set up correctly. Expect redirector load to decrease once that's verified and corrected as needed. 19

LIGO + StashCache • LIGO data is private for a few years • Protected data by using a secure federation • CVMFS uses the X509 certificate from the user’s environment • Certificate is propagated to the cache server to access the data • Publication: DOI 10.1145/3093338.3093363 20

LIGO Data Access • Roughly 1Mbps per core • 2016: 13.8 Million Hours - 5.8PB • 2017: 8.2 Million Hours - 3.4 PB 21

UNL Bioinformatics Core Research Facility Microbiome composition changes (often rapidly) over time 22

Bioinformatics (JeanJack) • Each job scans a 25GB data set 3 times. • The 25GB is stored within StashCache, pulled down for each job. • Copied to local node to optimize second and third scan 23

Summary • Due to CVMFS caching on the local filesystem, we only have lower end estimates • Over the last 1 year: • ~10PB data transferred • ~88% Cache hit rate 24

What’s Next • Writable Stash • Uses Authentication with SciTokens • File Based Monitoring 25

Writable Stash • We have always had issues with writing back to Stash • Options can include: • HTCondor’s Chirp: requires going back through submit host • SSH Key: Have to transfer your SSH key onto the OSG 26

Writable Stash • Uses Bearer Tokens — SciTokens • Short lived tokens with very restrictive capabilities > PUT /user/dweitzel/stuff HTTP/1.1 > Host: demo.scitokens.org XRootD / Stash > User-Agent: curl/7.52.1 > Accept: */* > Authorization: Bearer XXXXXXXX 27

Resources • Admin Docs: • https://opensciencegrid.github.io/StashCache/ • User Docs (OSG User Support maintained): • https://support.opensciencegrid.org/support/solutions/ articles/12000002775-transferring-data-with- stashcache 28

StashCache Derek Weitzel Open Science Grid (with slides from Brian - PowerPoint PPT Presentation

StashCache Derek Weitzel Open Science Grid (with slides from Brian Bockelman) 1 2015 OSG All Hands Meeting Northwestern University 2 Motivation Opportunistic Computing is like giving away empty airline seats; the plane was going to fly

StashCache K8s I2 Deployment Container Workshop in Madison Nov 28 2018 Edgar Fajardo on

A Case Study -- Chu et al. The Transcriptional Program of An interesting early microarray

Boxicity and topological invariants Louis Esperet CNRS, Laboratoire G-SCOP, Grenoble, France

Draft CONTROL AND OPTIMIZATION OF PDES - GRAZ 2011 1/20 UC University of Cantabria

(( b i ) ( a i )) I ( f ) U (( b i ) ( a i )) . L i = 1 i = 1

Modelling Biochemical Reaction Networks Lecture 6: Coupling uptake and growth Marc R. Roussel

Detection of network motifs by local Local Statistics concentration A global statistic Motif

Explosives & Experiments Amanda Kiely Ketchikan Public Library I science Its

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019 Problem Introduction:

Continuity of Care Ontario Telemedicine Network (OTN) Non profit agency of the Ontario

COVID-19 and LTC October 01, 2020 Questions and Answer Session Use the QA box in the webinar

WEATHER M EDIC INC J ACK W. KANACK Web site: www.weathermedic.com Email:

CompSci 101: Test 2 PRACTISE Peter Lorensen April 8, 2013 Name:

Assisted Discovery of On-Chip Debug Interfaces Joe Grand (@joegrand) Agenda Introduction

Virtual Machine Part I: Stack Arithmetic Foundations of Global Networked Computing: Building a

Community Water Planning Workshop December 2, 2017 Historic Yellow Springs, Chester Springs, PA

Jack Fried Cold Electronics Review October 13, 2016 10/13/2016 Cold Electronics Review 1

HW Breakout - AUGUST 2017 Feedback Francois Kapp www.ska.ac.za H/W Breakout - Agenda

Orange Empire Signal Garden Lessons Learned OR Remove spider * before servicing main board. *

Mini Course on Epistemic Game Theory Toulouse, June 30 - July 3, 2015 Exercises Part I: Common

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

2013 Full Year Result Terry Davis Group Managing Director John Murphy Managing Director Australian

Bsides Vienna 2016 Paul Coggin @PaulCoggin 1 1 OSI and TCP/IP Model OSI Model TCP/IP Model

Automa utomation tion of of Mit MitM M Attac Attack k on on WiFi iFi Netw Networ orks

Sambuz

Useful Links

Newsletter

Mail Us

StashCache Derek Weitzel Open Science Grid (with slides from Brian - PowerPoint PPT Presentation

StashCache Derek Weitzel Open Science Grid (with slides from Brian Bockelman) 1 2015 OSG All Hands Meeting Northwestern University 2 Motivation Opportunistic Computing is like giving away empty airline seats; the plane was going to fly

StashCache K8s I2 Deployment Container Workshop in Madison Nov 28 2018 Edgar Fajardo on

A Case Study -- Chu et al. The Transcriptional Program of An interesting early microarray

Boxicity and topological invariants Louis Esperet CNRS, Laboratoire G-SCOP, Grenoble, France

Draft CONTROL AND OPTIMIZATION OF PDES - GRAZ 2011 1/20 UC University of Cantabria

(( b i ) ( a i )) I ( f ) U (( b i ) ( a i )) . L i = 1 i = 1

Modelling Biochemical Reaction Networks Lecture 6: Coupling uptake and growth Marc R. Roussel

Detection of network motifs by local Local Statistics concentration A global statistic Motif

Explosives &amp; Experiments Amanda Kiely Ketchikan Public Library I science Its

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019 Problem Introduction:

Continuity of Care Ontario Telemedicine Network (OTN) Non profit agency of the Ontario

COVID-19 and LTC October 01, 2020 Questions and Answer Session Use the QA box in the webinar

WEATHER M EDIC INC J ACK W. KANACK Web site: www.weathermedic.com Email:

CompSci 101: Test 2 PRACTISE Peter Lorensen April 8, 2013 Name:

Assisted Discovery of On-Chip Debug Interfaces Joe Grand (@joegrand) Agenda Introduction

Virtual Machine Part I: Stack Arithmetic Foundations of Global Networked Computing: Building a

Community Water Planning Workshop December 2, 2017 Historic Yellow Springs, Chester Springs, PA

Jack Fried Cold Electronics Review October 13, 2016 10/13/2016 Cold Electronics Review 1

HW Breakout - AUGUST 2017 Feedback Francois Kapp www.ska.ac.za H/W Breakout - Agenda

Orange Empire Signal Garden Lessons Learned OR Remove spider * before servicing main board. *

Mini Course on Epistemic Game Theory Toulouse, June 30 - July 3, 2015 Exercises Part I: Common

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

2013 Full Year Result Terry Davis Group Managing Director John Murphy Managing Director Australian

Bsides Vienna 2016 Paul Coggin @PaulCoggin 1 1 OSI and TCP/IP Model OSI Model TCP/IP Model

Automa utomation tion of of Mit MitM M Attac Attack k on on WiFi iFi Netw Networ orks

Sambuz

Useful Links

Newsletter

Mail Us

Explosives & Experiments Amanda Kiely Ketchikan Public Library I science Its