PanDA in Nutshell PanDA = Production and Distributed Analysis System - PowerPoint PPT Presentation

PanDA Tadashi Maeno (BNL) NPPS meeting, Jun 19

PanDA in Nutshell ➢ PanDA = Production and Distributed Analysis System – Designed to meet ATLAS production/analysis requirements for a data-driven workload management system capable of operating at LHC data processing scale ➢ Continuous evolution while steadily running for ATLAS since 2005 including data taking periods – Significant refactoring to move to Oracle from MySQL, major system reengineering to implement new paradigm for high level workload management and fine-grained processing mechanism, migration of ATLAS DDMS to Rucio from DQ2, migration to new pilot provisioning machinery, … ➢ ~150k running production+analysis jobs with ~440k cores, ~32M HTTPS sessions per day, 56M transactions in Oracle per day, 1.6k individual users for analysis in 1 year ➢ ATLAS PanDA, BigPanDA, BigPanDA++, beyond ATLAS, Google projects, ... ➢ Plenty of advanced and interesting functions/activities but only recent ATLAS ones to show due to limited time slot 2

PanDA in ATLAS Computing Production managers AGIS jobs production scheduler task PanDA Server get/update job spin-up kill pilot generate VM / generate container analysis submit pilot task job+pilot DEFT JEDI Cloud request job or pilot request job or pilot Harvester spin-up Edge node Harvester Harvester Node Harvester End-user Node Node submit, monitor, increase or throttle or submit, kill pilot submit pilots monitor, get/update job kill pilot kill pilot subset of pilot pilot scheduler components or CE CE submit pilot Grid site compute nodes pilot HPC center 3 Tadashi Maeno, CHEP2018, 9-13 July 2018, Sofia, Bulgaria

Harvester 1/2 ➢ A resource-facing service between PanDA server and collection of pilots (workers) for pilot provisioning ➢ Stateless service plus database for local bookkeeping ➢ Flexible deployment model and modular design for various resource types and workflows – On HPC edge nodes with limited runtime environment → A single node + multi-threading + sqlite3. On dedicated nodes → Multiple nodes + multi-processing + MariaDB – Plugins with native API, such as SLURM, LSF, EC2, GCE, k8s, gfal, …, and plugins with 3rd party services, such as condor, ARC interface, Rucio, FTS, Globus Online, ... ➢ Objectives – A common machinery for pilot provisioning on all computing resources – Better resource monitoring – Coherent implementations for HPCs – Timely optimization of CPU allocation among various resource types and removal of batch-level partitioning – Tight integration between WFMS and resources for new workflows ➢ The project launched in Dec 2016 with 11 developers in US (BNL, UTA, Duke U, ANL), Norway, Slovenia, Taiwan, Italy, and Russia 4

Harvester 2/2 ➢ Entire ATLAS grid migrated by Jan 2019 ➢ ATLAS High Level Trigger (HLT) CPU farm with 50k cores, aka Sim@P1 in production ➢ Successfully demonstrated GCE + GCE API + Google Storage + preemptible VMs ➢ All US DOE HPCs in production since Feb 2018 The number of events processed per day at US HPCs around migration Migration of UK grid resources Effect of switching from normal VMS to preemptible VMs on GCE 5

Integration of HPCs with Jumbo Payload ➢ Batch jobs are no longer atomic entities in PanDA thanks to capability of high level workload management and event-level bookkeeping ➢ Dynamic shaping of jobs based on real time information of available compute power and walltime for each resource ➢ No dedicated/custom tasks for HPCs – Old : Special tasks to have big jobs at HPC – New : Common tasks share among various resources including HPCs to have proper sizes of jobs at each resource ➢ In full production at Theta/ALCF and Cori/NERSC while at limited scale for Titan/OLCF due to fragile OLCF file system ➢ Successfully ran at MareNostrum 4 at BSC, will continue for MN5 which has been granted by EuroHPC recently 6

Resources via Kubernetes ➢ Use Kubernetes as CE + a batch system – Central harvester manages remote resources through kubernetes ➢ Based on SLC6 containers and CVMFS-csi driver ➢ Proxy passed through K8s Secret ➢ Still room for evolution, e.g. allow arbitrary container/options execution, maybe split I/O in 1-core container, improve usage of infrastructure ➢ Tested at scale for some weeks at CERN, being continued at UVic With default K8s scheduler Job (round robin Container Container load balance) #Jobs Pilot Pilot K8s K8s master Sweeper Delete failed pods K8s Container Container Core Poll pod states Monitor K8s Pilot Pilot With policy Create new pods Submitter tuning to Harvester pack nodes I/O 7 RSE

HPC/GPU + ML + MPI ➢ Distributed training on HPC or GPU cluster through PanDA and Harvester ➢ Multi-node payload with MPI to be prepared by users for now – Might provide a common MPI framework in the future ➢ On-demand deployment for user container images ➢ Trying at BNL Institutional Cluster Compute nodes Head node Docker hub get img Harvester fetch job Submit job aprun container img job Deploy img upload img task Outbound connection submit task Share FS 8 GPU job

iDDS 1/2 Source Storage d o w n l o a External d service Input data request preprocess Data info iDDS Requester Head notify iDDS delete Temporary agent get + report data upload process Destination Consumer Storage process ➢ iDDS : intelligent Data Delivery Service ➢ An intelligent service to preprocess and Cache / deliver data to consumers Hop Storage – Delivered data = files, file fragments, file information, or sets of files 9 GPU job

iDDS 2/2 ➢ Join project between ATLAS and IRIS-HEP ➢ To generalize concept/workflow of Event Streaming Service ➢ Not a storage, WFMS, or DDMS – Delegation of many functions to WFMS, DDMS and Cache ➢ iDDS + WFMS (as preprocessing backend) + DDMS + Cache = CDN ➢ Requirements – Experiment agnostic – Flexibility to support more use-cases and backend systems – Easy and cheaper deployment ➢ ATLAS usecases – Fine-grained processing – Tape carousel and dynamic data placement – Data delivation with WAN – On demand data transfers at HPC – Custom data transformation for hyperparameter optimization – ... ➢ Potentially huge R&D but ATLAS manpower is limited for now ➢ Splinter meeting in S&C workshop next week in NY to reach a consensus in ATLAS before the project “officially” kicks off – Collaboration with other projects – Manpower allocation 10

PanDA in Nutshell PanDA = Production and Distributed Analysis System - PowerPoint PPT Presentation

PanDA Tadashi Maeno (BNL) NPPS meeting, Jun 19 PanDA in Nutshell PanDA = Production and Distributed Analysis System Designed to meet ATLAS production/analysis requirements for a data-driven workload management system capable of

Panda Hill Niobium Cradle Definitive Feasibility Study Panda Hill Managing Director E:

PANDA PV archiving Alexandru Mario Bragadireanu, Particle Physics Department, IFIN-HH M gurele

Simulation of cooling system for PANDA electromagnetic calorimeter using CFD PANDA Collaboration

The PANDA Project Guy P. Brasseur Jan. 2015 Objective of the PANDA Project To establish a

The PANDA Experiment at FAIR Diego Bettoni INFN, Ferrara, Italy for the PANDA

The Detector Control of the PANDA Experiment Florian Feldbauer on behalf of the PANDA

www.panda-condo.com Panda Condominiums Location: 20 Edward Street (Edward St & Dundas

Audio Engineering in a Nutshell Audio Engineering in a Nutshell - Or: The next two hours of your

The PANDA Barrel-TOF Detector Sebastian Zimmermann On behalf of the Panda Barrel-TOF group

Navigation in and Alignment of (Panda)ROOT Geometries Prometeusz Jasinski 08.06.2015 Panda

A Read-out System for the PANDA MVD Prototypes Status Update @PANDA LXI Member of the Helmholtz

Overview Introduction Preproposal for PANDA (What) Plan of Study to prepare proposal

Possible preassembly in Julich Pavel Semenov, PANDA collaboration meeting, Goa New shashlyk

The Panda Hunter Game Jie Gao Stony Brook University http://www.cs.sunysb.edu/ jgao IMA

Technical Design Report for the: PANDA Data Acquisition and Event Filtring (AntiProton

Detector technologies with PANDA Anastasios Tassos Belias / GSI Detector technologies with PANDA

Scintillation Tile Hodoscope for the PANDA Barrel Time-Of-Flight Detector William Nalti, Ken

PANDA MVD Session at GSI - ZEA-1 Status 05. NOVEMBER 2019 I D. GRUNWALD, E. ROSENTHAL, B.

PanDA PanDA-based based GRID Workload Management GRID Workload Management Maxim Potekhin

Hyperon Spectroscopy with _ PANDA Sep 13, 2016 | Albrecht Gillitzer, IKP

The Potential for Baryon Spectroscopy at PANDA and the Day-1 Setup Sep 14, 2016 |

Status of the Electrical Installation and the Operating Principle of the PANDA Cluster-Jet

Report from Panda DAQT and Frontend Workshop Sren Lange (for the DAQT Group) XXXIII Panda

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

PanDA in Nutshell PanDA = Production and Distributed Analysis System - PowerPoint PPT Presentation

PanDA Tadashi Maeno (BNL) NPPS meeting, Jun 19 PanDA in Nutshell PanDA = Production and Distributed Analysis System Designed to meet ATLAS production/analysis requirements for a data-driven workload management system capable of

Panda Hill Niobium Cradle Definitive Feasibility Study Panda Hill Managing Director E:

PANDA PV archiving Alexandru Mario Bragadireanu, Particle Physics Department, IFIN-HH M gurele

Simulation of cooling system for PANDA electromagnetic calorimeter using CFD PANDA Collaboration

The PANDA Project Guy P. Brasseur Jan. 2015 Objective of the PANDA Project To establish a

The PANDA Experiment at FAIR Diego Bettoni INFN, Ferrara, Italy for the PANDA

The Detector Control of the PANDA Experiment Florian Feldbauer on behalf of the PANDA

www.panda-condo.com Panda Condominiums Location: 20 Edward Street (Edward St &amp; Dundas

Audio Engineering in a Nutshell Audio Engineering in a Nutshell - Or: The next two hours of your

The PANDA Barrel-TOF Detector Sebastian Zimmermann On behalf of the Panda Barrel-TOF group

Navigation in and Alignment of (Panda)ROOT Geometries Prometeusz Jasinski 08.06.2015 Panda

A Read-out System for the PANDA MVD Prototypes Status Update @PANDA LXI Member of the Helmholtz

Overview Introduction Preproposal for PANDA (What) Plan of Study to prepare proposal

Possible preassembly in Julich Pavel Semenov, PANDA collaboration meeting, Goa New shashlyk

The Panda Hunter Game Jie Gao Stony Brook University http://www.cs.sunysb.edu/ jgao IMA

Technical Design Report for the: PANDA Data Acquisition and Event Filtring (AntiProton

Detector technologies with PANDA Anastasios Tassos Belias / GSI Detector technologies with PANDA

Scintillation Tile Hodoscope for the PANDA Barrel Time-Of-Flight Detector William Nalti, Ken

PANDA MVD Session at GSI - ZEA-1 Status 05. NOVEMBER 2019 I D. GRUNWALD, E. ROSENTHAL, B.

PanDA PanDA-based based GRID Workload Management GRID Workload Management Maxim Potekhin

Hyperon Spectroscopy with _ PANDA Sep 13, 2016 | Albrecht Gillitzer, IKP

The Potential for Baryon Spectroscopy at PANDA and the Day-1 Setup Sep 14, 2016 |

Status of the Electrical Installation and the Operating Principle of the PANDA Cluster-Jet

Report from Panda DAQT and Frontend Workshop Sren Lange (for the DAQT Group) XXXIII Panda

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

www.panda-condo.com Panda Condominiums Location: 20 Edward Street (Edward St & Dundas