DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018 - PowerPoint PPT Presentation

WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN Stephan Egli :: Paul Scherrer Institut :: Photon Science Department DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018

Experiences gained at PSI Purpose of this talk: • What existing solutions do we have so far ? • What are new options that we might explore more in future ? • Intention: provide input for discussion: • how to merge the best ideas and experiences gained at our different sites ? • which building blocks should be part of a new blueprint ? • Illustrate need for extensive tests and explorations of options to be able to make the right decisions in time and therefore important to compare each other experiences • Disclaimer: I just summarize the situation. All errors and omissions are my fault. The results achieved are all due to the long term commitment and the tremendous efforts invested by the colleagues from the Science IT department of PSI and colleagues from the ESS /Data Archive project ! Seite 2

Data Analysis as a Service Project • See Webpage: https://www.psi.ch/photon-science-data-services/offline-computing- facility-for-sls-and-swissfel-data-analysis • Main goal: make offline data analysis of large datasets easier for researchers • Needs sophisticated and high performance storage infrastructure with • Good connectivity to online systems • Good Software environments and support • Current typical usage: between 30000 and 60000 CPU hours per group and month for the main user groups cSaxs, Tomcat, MX and Swissfel Seite 3

DaaS Infrastructure Overview Seite 4

Online-Offline connectivity SLS Seite 5

Spectrum Scale GPFS Active File Management Seite 6

Software Environment, Expert Support • Provide standard software for interactive analysis and visualization, like Matlab, Mathematica, iPython environments etc − in different versions as well as domain specific − Extended environment module system to mitigate the problem of providing different SW (p-modules) version and development environments to different researchers and for different architectures • Provide ready-to-use scientific software packages - e.g. MX: solve protein structures from the SLS and FEL data, collected using both conventional methods (rotating sample) and serial crystallography methods. • Provide SW development environments to allow researchers to build, develop and refine the scientific codes • Provide support for different compiler chains (gcc, intel, OpenMP, MPI, Cuda) • Provide help to scientists in tuning the algorithms and optimizing them. Often gives the largest overall performance boost. This needs local experts knowledgable both in science and IT algorithms and code optimizations, e.g. running parallelized ptychographic reconstruction codes. • Jupyter Notebooks for web based interactive work. Page 7

Interactive analysis with Jupyter Notebooks Environment Cluster Queues Page 8

Further components in use • Batch system SLURM as batch scheduler. Resource management done by integrating with Linux c-groups • Remote access via NoMachine, classical GUI login, users see all the same environment and then work either interactively or submit batch jobs. • Remote data transfer: Globus Online (gridFTP based), rsync for special use cases. • Integration with data catalog and archive system (see later) Page 9

Data Catalog and Container Orchestration • Data catalog is an important component for the overall data management life-cycle − Gateway to the archive system for long term storage − Necessary component to implement the data policy − Challenge: integration into existing and historically grownenvironments demands a flexible framework. • We use the SciCat data catalog, see https://github.com/SciCatProject • Architecture based on microservices which are very well suited to run in containers • Needs a container orchestration platform. We chose Kubernetes. • Experience with Kubernetes is very good, both in terms of functionality and operational stability. Initially built for long running web service type applications • Persistency layer implemented via MongoDB Page 10

Overall Data Catalog Architecture Page 11

Kubernetes Dashboard: Overview over all test and prod environments Page 12

Single Pod Infos Page 13

Beamline Ingestors based on Node-Red Page 14

Data catalog GUI, user view Page 15

Scientific Metadata View Page 16

Access to Datacatalog via OpenAPI REST API Page 17

Containers for Data Analysis • Disclaimer: only minimal own experience so far • Potential advantage: − adaptability to existing environments at different sites − containers allow to provide OS environments tailored to the needs of the different scientist groups − containers make it easier to share full work environments • New Container implementations for better HPC support − Shifter-NG: Linux containers for HPC (NERSC, CSCS, tested within HEP application together with Science IT): Allows an HPC system to efficiently and safely allow end-users to run a docker image. Integration with batch scheduler systems. Security oriented to HPC systems, native performance of custom HPC hardware. Compatible with Docker. − Singularity: Mobility of Compute, see http://singularity.lbl.gov/. Leverage resources like HPC interconnects, resource managers, file systems, GPUs and/or accelerators, etc. Page 18

Kubernetes for Data Analysis ? • Originally main application area: (long running) web services • Can be exploited for Jupyter notebooks (ready to use helm charts) • Meanwhile concepts for Kubernetes extended: Jobs/Batch Ressources • Ideas for Integration with Shifter/Singularity type containers Kubernetes: (OCI compliant runtimes): https://www.sylabs.io/2018/03/singularity-oci-cloud- native-enterprise-performance-computing/ • Remark: Kubernetes also planned to be used by the Controls colleagues for machine and beamline control system infrastructure Page 19

Some Open points and Questions • If we make use of container technology in a HPC/HTC environment: − which container image type(s) to use ? Should it be Docker compatible in any case ? − How to overcome Docker limitations: docker's main design goal is to provide completely independent container images, while a HPC cluster always is built on the sharing of some specially efficient HW components. Inefficiency on parallel filesystems due to its stacked container format ? − how do we handle storage resources efficiently for HTC applications ? This implies integration of parallel FS, network performance and security aspects − how do we manage resources (batch systems vs container orchestration, HPC Cluster vs “Cloud”) , see e.g. https://kubernetes.io/blog/2017/08/kubernetes- meets-high-performance/ . Choose one or the other or both merged in some way ? − Do containers make the virtualization layer unnecessary ? Or do we still need it e.g. for optimal reproducability ? Page 20

Tools (in use) at other sites • CERN: for HEP use cases − Reusable Analysis platform REANA/RECAST http://github.com/reanahub , http://github.com/recast-hep : Workflow Engine where each step is a Kubernetes Job − HTCondor and Docker/Kubernetes: https://zenodo.org/record/1042367/files/clenimar-report-cern-final.pdf • SDSC/EPFL: Renga - http://renga.readthedocs.io/en/latest/ − Securely manage, share and process large-scale data across untrusted parties operating in a federated environment. − Automatically capture complete lineage up to original raw data for detailed traceability, auditability & reproducibility. • Aiida http://www.aiida.net/ :Automated Interactive Infrastructure and Database for Computational Science and Materials Cloud: https://www.materialscloud.org/home : A platform for open science Page 21

Summary • This is just a sketch of the situation as far as I am aware of it • There are a lot of interesting developments currently ongoing • The whole topic is WIP, constantly moving and adapting • The future path(es) still need to be explored by all of us and it will help to share our experiences • Finding good solutions and at the same time minimizing the risks is favoring an iterative approach and resources/willingness to test, implement (or abandon) solutions Page 22

Acknowledgements • Thanks go the help from all colleagues in the IT department involved, in particular the Science IT and the colleagues from the ESS project Page 23

DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018 - PowerPoint PPT Presentation

WIR SCHAFFEN WISSEN HEUTE FR MORGEN Stephan Egli :: Paul Scherrer Institut :: Photon Science Department DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018 Experiences gained at PSI Purpose of this talk: What

The Bunch Arrival Time Monitor (BAM) at PSI PSI, PSI, June 10, 2013 PSI, June 10, 2013 PSI,

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Desktop Virtualization (DaaS): Enabling the Anywhere and The Anytime Power Christopher Stavely Dan

PSI Muon Experiment at the PSI , KEK RCNP

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

Omega Psi Phi Fraternity, Inc. Eta Delta Delta Chapter The History of Omega Psi Phi Omega

Organisation Identifier Project PSI RFI Response Shauna Pitts, on behalf of Andrew Pitts PSI

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

SWAN DaaS Themed Call Led by Meena Sankaran Founder & CEO KETOS October 14, 2020 1

Python, Docker, Kubernetes, Python, Docker, Kubernetes, and beyond? and beyond? Peter Bbics

A Survey on Private Set Intersection Presented by Hongrui Cui RickFreeman@sjtu.edu.cn October

Private Set In Intersection (PSI): in the Cloud, or using Circuits Benny Pinkas September 10,

Difficulties in Running Experiments in the Software Industry: Experiences from the Trenches

Quasiparticle Diffusion in CRESST Light Detectors Marc W ustrich Max-Planck Institute f.

Reionization: When How - What is left ? CMB : T and polarisation large and small scales

A Topic Map Templates based Prototype User Support Modeling for Software Development Support

PSD of Digitally Modulated Signals Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

Study of the PSD CBM response on hadron beams Nikolay Karpushkin, INR RAS FAIRNESS 20 May 2019

SI231 Matrix Computations Lecture 6: Positive Semidefinite Matrices Ziping Zhao Fall Term

Einstein-Wiener-Khinchin theorem, PSD applications, modeling filters 6.011, Spring 2018 Lec 19

DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018 - PowerPoint PPT Presentation

WIR SCHAFFEN WISSEN HEUTE FR MORGEN Stephan Egli :: Paul Scherrer Institut :: Photon Science Department DaaS and Kubernetes at PSI CALIPSOplus JRA2 Meeting, May 23 rd 2018 Experiences gained at PSI Purpose of this talk: What

The Bunch Arrival Time Monitor (BAM) at PSI PSI, PSI, June 10, 2013 PSI, June 10, 2013 PSI,

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Desktop Virtualization (DaaS): Enabling the Anywhere and The Anytime Power Christopher Stavely Dan

PSI Muon Experiment at the PSI , KEK RCNP

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

Omega Psi Phi Fraternity, Inc. Eta Delta Delta Chapter The History of Omega Psi Phi Omega

Organisation Identifier Project PSI RFI Response Shauna Pitts, on behalf of Andrew Pitts PSI

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

SWAN DaaS Themed Call Led by Meena Sankaran Founder &amp; CEO KETOS October 14, 2020 1

Python, Docker, Kubernetes, Python, Docker, Kubernetes, and beyond? and beyond? Peter Bbics

A Survey on Private Set Intersection Presented by Hongrui Cui RickFreeman@sjtu.edu.cn October

Private Set In Intersection (PSI): in the Cloud, or using Circuits Benny Pinkas September 10,

Difficulties in Running Experiments in the Software Industry: Experiences from the Trenches

Quasiparticle Diffusion in CRESST Light Detectors Marc W ustrich Max-Planck Institute f.

Reionization: When How - What is left ? CMB : T and polarisation large and small scales

A Topic Map Templates based Prototype User Support Modeling for Software Development Support

PSD of Digitally Modulated Signals Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

Study of the PSD CBM response on hadron beams Nikolay Karpushkin, INR RAS FAIRNESS 20 May 2019

SI231 Matrix Computations Lecture 6: Positive Semidefinite Matrices Ziping Zhao Fall Term

Einstein-Wiener-Khinchin theorem, PSD applications, modeling filters 6.011, Spring 2018 Lec 19

SWAN DaaS Themed Call Led by Meena Sankaran Founder & CEO KETOS October 14, 2020 1