This talk These slides are a compendium of individual topics - PowerPoint PPT Presentation

Rucio Concepts and principles Rob Gardner, Benedikt Riedel Mario Lassnig University of Chicago CERN Open Science Grid Blueprint December 8, 2017

This talk ● These slides are a compendium of individual topics relevant for input to further discussion today ● special thanks to Mario Lassnig who provided the vast majority of input 2

Rucio in a nutshell ● Main functionalities Total ATLAS Discovery, Location, Transfer, Deletion ○ ○ Quota, Permission, Consistency data Monitoring, Analytics ○ ○ Can enforce computing models Integration with workload ● 1+ billion files management Automation of operations ● ● Enables heterogeneous data 1+ Petabyte/day 2+ million files/day management No vendor/product lock-in ○ Able to follow the market ○ 3

Namespace handling ● Smallest addressable unit is the file Files can be grouped into datasets ● Datasets can be grouped into containers ● Names are partitioned by scopes ● ○ To distinguish users, groups and activities Accounts map to users/groups/activities ○ Multiple data ownership across accounts ● ● Large set of available metadata, e.g. Data management: size, checksums, creation times, ○ access times, … Physics: run identification, derivations, events, … ○ 4 ○ ...

Declarative data management ● Express what you want, not how you want it e.g., "3 copies of this dataset, distributed evenly across two continents, with 1 copy on TAPE" ○ ○ Rules can be dynamically added and removed by all users, some pending authorisation Evaluation engine resolves all rules and tries to satisfy them by with transfers/deletions ○ Replication rules ● ○ Lock data against deletion in particular places for a given lifetime or pin Primary replicas have indefinite lifetime rules ○ ○ Secondary replicas are dynamically created replicas based on traced usage and their access popularity Subscriptions ● Automatically generate rules for newly registered data matching a set of filters/metadata ○ ○ e.g., spread project=data17_13TeV and data_type=AOD evenly across T1s 5

Monitoring ● RucioUI Provides several views for different types of users ○ ○ Normal users: Data discovery and details, transfer requests Site admins: Quota management and transfer approvals ○ ○ Admin: Account / Identity / Storage management Monitoring ● Internal system health monitoring (Graphite / Grafana) ○ ○ Transfer / Staging / Deletion monitoring using industry-stranding architectures (ActiveMQ / Kafka / Spark / HDFS / ElasticSearch / InfluxDB / Grafana) ● Analytics Periodic full database dumps to Hadoop (pilot traces, transfer events, … ) ○ ○ Used studies, e.g., transfer time estimation which is now already in a pre-production stage 6

Third party copy ● Rucio provides a generic transfertool API submit_transfers() , query_transfer_status() , cancel_transfers() , ... ○ ○ Independent of underlying transfer service Asynchronous interface to any potential third-party tool ○ Currently only available implementation of transfertool API is FTS3 ● Additional notification channel via ActiveMQ for instant acknowledgments ○ ○ Potential to include GlobusOnline for improved HPC data transfers FTS3 Deployment ● ○ CERN Pilot, CERN Production, RAL Production, BNL Production We distribute our transfers across all FTS3 servers based on file destination ○ ■ ( We also have one dedicated for OSG use in production ) 7

Topology ● Storage systems are abstracted as Rucio Storage Elements (RSEs) Logical definition, not a software stack ○ ○ Mapping between activities, hostnames, protocols, ports, paths, sites, … Define priorities between protocols and numerical distances between sites ○ ○ Can be tagged with metadata for grouping Files on RSEs are stored deterministically via hash function ○ ■ Can be overridden (e.g., useful for Tier-0, TAPE, fixed data output experiments, … ) Rucio's topology can exist standalone outside an information catalogue ● However, for a non-trivial amount of sites this can quickly become infeasible ○ ■ We suggest to have a flexible way of describing resources For ATLAS, we use AGIS (ATLAS Grid Information System) and sync to Rucio via Nagios ○ ○ AGIS is now evolving into generic CRIC (Computing Resource Information Catalogue) 8

Key design principles ● Horizontal scalability of servers and services Data streams ● Stateless API — serve each request independently ○ Servers can handle arbitrary length responses (e.g., list 1 billion files) ○ ● Work sharding All daemons share their work-queues ○ Algorithm for work selection independent of length of workqueue! ○ Elastic and fail-safe ○ ■ If one service goes down (e.g, node failure) others take over automatically, no need to reconfigure or restart Fault-tolerance ● Fail hard and early, but keep running and retry once up ○ 9

Rucio daemons and operations ● 10 daemons Minimum 2 daemons required ○ Rule evaluation daemon, Transfer handling daemon ■ All others give extra functionality and can be enabled as required ○ ■ Deletion, Rebalancing, Popularity, Tracing, Messaging, … Sites do not run any Rucio services — they only need to operate storage ● ● ATLAS DDM Central Team operates 320+PB on 120 sites with <2 FTE! Due to all the automations that Rucio daemons provide ○ 10

Known Rucio limits ● Backend database performance Scaling tests up to LHC Run-3 expectations showed no problems on ○ CERN Oracle instance Want to do more scaling tests with MariaDB and PostgreSQL ○ ● Single-node limit for rule evaluation 8 GB of RAM can serve a single rule with max 500'000 files ○ This limitation is currently being addressed ○ Automated deployment of nodes due to load ● ○ Datacenter issue Currently requires operator to bring up new nodes ○ Want to automate this based on internal system performance metrics ○ 11

Rucio dependences ● Python 2.7 ○ Major parts already Python3 compatible ● Multiple database support ○ Object-relational mapper ○ SQLite, MySQL/MariaDB, PostgreSQL, Oracle ● File Transfer service ○ FTS3 12

API errors Monitoring Rucio API usage All the DDM data ● dumped to HDFS once a day. ● All the traces kept in Hadoop and ES Internal monitoring with ● WEB UI Grafana Operations 13

API Usage in UC Elasticsearch 14

Daemon activity ● Judge - replication rule engine ● Automatix - generates fake data and upload it on a RSE Conveyer - handles requests for data transfers ● Undertaker - obsoleting data identifiers with expired lifetime ● ● Hermes - delivers messages to an asynchronous broker ● Kronos - consumes tracer messages and updates replica last access time accordingly Reaper - deletion of the expired data replicas ● Necromancer - tries to repair erroneous rules, by selecting different replica destinations ● 15 ● Transmogrifier - is responsible to apply subscriptions and to generate replication rules

Understanding and optimizing FTS usage Requires a lot of different data sources: ● Rucio (detailed log on transactions) ● FTS (optimizer settings, reasons behind decisions) ● Sites storage load (from summing up all the traffic) ● Network (PerfSONAR) For the first time we have all the information and can do detailed analysis, even simulations of how system would behave with different settings. We found a lot of space for improvement. 16

ATLAS Statistics ● ~1 billion active files ● ~2 billion archived files ● ~15M datasets/containers ● 840 storage endpoints ● 340 PB storage almost full ● 1.5 PB/day transferred, peaks up to 2.5 PB/day ● 2 PB/day deleted 17

XENON1T Statistics ● > 1.2M Files ● ~16k Datasets ● 9 storage endpoints ● 1887.5 TB of available storage ● 854.1 TB of available storage used ● Adding 1.3 TB per day, 200+ files per hour ● > 115 GB per hour transferred 18

AMS Statistics ● ~1M Files ● ~50k Datasets ● 9 storage endpoints ● ~2 PB of available storage ● ~1.5 PB of available storage used 19

Comparison with similar systems ● PhEDEx ● Globus ○ Can serve as alternative to FTS3 data transport but entirely different set of management principles ● DynaFed, EOS Federation, Xroot Federation ○ Inter-cluster shared filesystem ○ Dynamic discovery of data ○ Can be used as RSEs 20

Rucio vocabulary ● DID (Data IDentifier) File ○ Dataset ○ Container ○ ● Scope DID namespace partition ○ RSE (Rucio Storage Element) ● Topology description of a storage endpoint ○ ● Rules Declarative mapping of DIDs to RSEs ○ Subscription ● Automatic generation of rules ○ 21

References ● Code https://github.com/rucio/rucio ● Web https://rucio.cern.ch/ ● Docker https://hob.docker.com/r/rucio ● Support https://rucio.slack.com/ ● Mail rucio-dev@cern.ch 22

This talk These slides are a compendium of individual topics - PowerPoint PPT Presentation

Rucio Concepts and principles Rob Gardner, Benedikt Riedel Mario Lassnig University of Chicago CERN Open Science Grid Blueprint December 8, 2017 This talk These slides are a compendium of individual topics relevant for input to further

How To Give How To Give a good good Technical Talk Technical Talk Bertrand Meyer Bertrand

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

Harnessing the Power of Self-Talk Mary Fran Bontempo Self-Talk Self-Talk is your most

Crafting Your Girl Talk Presentation A Guide for Women of Inspiration PAL Volunteer Services

My presentation AB123C Outline Talk about giving a talk A tool to plan and hold

WOCC 2007 Talk WOCC 2007 Talk WOCC 2007 Talk A Management Strategy for A Management Strategy

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Talk to me Drupal Talk to me Drupal Using Drupal to power a Voice App Speaker notes Talk to me

A Talk about How to Give a Talk Part II Bertram Fronhfer International Center for

3/7/2016 Customized Conversations Most of us talk to GOD every day and talk to LOST PEOPLE

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Rules WRITING OVERLOAD BLOG WOMEN TALK 02 Rule No. 1 BE KIND The whole point of Women Talk is

How to Deliver a Great TED Talk Presentation Secrets of How to Deliver a Great TED Talk

How to give a research talk Thomas D. Nielsen September 2008 How to give a research talk

Disclaimer Disclaimer This talk is not about the front end Disclaimer This talk is about

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich & ITMO Welcome to my talk !

Krystal Heibel July 2020 1 Railroad Commission of Texas | June 27, 2016 (Change Date In First

Exploiting Resolution Proofs to Speed Up LTL Vacuity Detection for BMC Jocelyn Simmonds Jessica

Department of Revenue possessed and believed to be accurate and relevant on the date of the

A 14907 - Strategic Management Accoun6ng (2018-2019) Session 11 Opera6onal Decisions Paul G.

ECE65, Winter 2012 ECE 65, Winter 2012, F. Najmabadi Instructor Farrokh Najmabadi

MITOCW | watch?v=J7d3vcaS9-o The following content is provided under a Creative Commons license.

Mature microservices and how to operate them Sarah Wells Technical Director for Operations &

2019 State of Reliability Report Mark G. Lauby Senior Vice President and Chief Engineer Key

This talk These slides are a compendium of individual topics - PowerPoint PPT Presentation

Rucio Concepts and principles Rob Gardner, Benedikt Riedel Mario Lassnig University of Chicago CERN Open Science Grid Blueprint December 8, 2017 This talk These slides are a compendium of individual topics relevant for input to further

How To Give How To Give a good good Technical Talk Technical Talk Bertrand Meyer Bertrand

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

Harnessing the Power of Self-Talk Mary Fran Bontempo Self-Talk Self-Talk is your most

Crafting Your Girl Talk Presentation A Guide for Women of Inspiration PAL Volunteer Services

My presentation AB123C Outline Talk about giving a talk A tool to plan and hold

WOCC 2007 Talk WOCC 2007 Talk WOCC 2007 Talk A Management Strategy for A Management Strategy

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Talk to me Drupal Talk to me Drupal Using Drupal to power a Voice App Speaker notes Talk to me

A Talk about How to Give a Talk Part II Bertram Fronhfer International Center for

3/7/2016 Customized Conversations Most of us talk to GOD every day and talk to LOST PEOPLE

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Rules WRITING OVERLOAD BLOG WOMEN TALK 02 Rule No. 1 BE KIND The whole point of Women Talk is

How to Deliver a Great TED Talk Presentation Secrets of How to Deliver a Great TED Talk

How to give a research talk Thomas D. Nielsen September 2008 How to give a research talk

Disclaimer Disclaimer This talk is not about the front end Disclaimer This talk is about

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich &amp; ITMO Welcome to my talk !

Krystal Heibel July 2020 1 Railroad Commission of Texas | June 27, 2016 (Change Date In First

Exploiting Resolution Proofs to Speed Up LTL Vacuity Detection for BMC Jocelyn Simmonds Jessica

Department of Revenue possessed and believed to be accurate and relevant on the date of the

A 14907 - Strategic Management Accoun6ng (2018-2019) Session 11 Opera6onal Decisions Paul G.

ECE65, Winter 2012 ECE 65, Winter 2012, F. Najmabadi Instructor Farrokh Najmabadi

MITOCW | watch?v=J7d3vcaS9-o The following content is provided under a Creative Commons license.

Mature microservices and how to operate them Sarah Wells Technical Director for Operations &amp;

2019 State of Reliability Report Mark G. Lauby Senior Vice President and Chief Engineer Key

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich & ITMO Welcome to my talk !

Mature microservices and how to operate them Sarah Wells Technical Director for Operations &