Efficient unpacking of required software from CernVM-FS Samuel Teuber • EP-SFT Openlab Summer Student Nicholas Hazekamp, Jakob Blomer, Gerardo Ganis 13.08.2018
Why is this Lack of internet connection on ● necessary? (1) compute nodes Lack of local hard disks ● Lack of system level privileges ● Challenges faced in some HPC environments (e.g. NERSC)
Why is this Run the same workflow every ● necessary? (2) time Minimize storage needs ● Minimize external factors ● Challenges faced in (e.g. internet connection) Benchmarking
How are these challenges tackled today? cvmfs_preload No internet connection Prepopulate cvmfs cache No harddisk Cache on HPC file system uncvmfs Download entire CVMFS repositories No FUSE client Filter afterwards ? Benchmarking
https://assets.nst.com.my/images/articles/26_bajajaaa_1521955064.jpg
https://en.wikipedia.org/wiki/Stretch_wrap#/media/File:Pallet_wrapper.jpg Shrink Wrapping A method for efficiently packaging required software from CVMFS into standalone images
Building a specification - No internet describing the - No disk cache necessary files for a - No fuse client software run ^/bar/etc/* /bar/Modules/setup.sh /foo/Packages/ROOT/* ^/foo/Packages/AliRoot/* Run Specification Export Independently Export with cvmfs_shrinkwrap (tar, squashfs, docker, ...)
Application design
Image architecture /cvmfs/ repo.cern.ch/ Exported repository structure Hardlinks .data/ 00/ … ff/ Content addressed file links .garbage/ Garbage Collection information Information for .provenance/ image reproducibility
FS Traversal Thread pool Single Threaded (for now) Copies files between abstract ● ● interfaces Matches paths to specification (~3us ● ( extendible to other fs architectures ) lookup) Responsible for IO-intensive copying ● In memory ls for directories in ● specification Responsible for file creation & ● hardlinks
Docker Injection
Replacing CVMFS docker layers CVMFS Layer Custom Container Layers 1 OS Base Layer
1. Identify CVMFS Layer: Hash of layer as Image Label 2. Download “old” layer version CVMFS Layer 3. Update through shrinkwrap utility Custom Container Layers 1 4. Upload “new” layer version 5. Update Image Labels & Manifest OS Base Layer
1. Identify CVMFS Layer: New Custom Container Layers 2 Hash as Image Label 2. Download “old” image version CVMFS Layer 3. Update through shrink wrap utility Custom Container Layers 1 4. Upload “new” image version 5. Update Image Labels OS Base Layer
Example & Evaluation
From a vanilla docker image... FROM centos:7 ... ADD HEP_OSlibs.repo /etc/yum.repos.d/HEP_OSlibs.repo RUN yum install -y HEP_OSlibs
...to a CVMFS injected image That can run ROOT demos $ cvmfs_shrinkwrap oci ROOT/root-demo -c hub.docker.com.conf * Making image CVMFS injectable (injecting empty CVMFS layer)... Generating local copy of specified cvmfs repository subset... Packing tar layer... Compressing to gzip... Injecting updated cvmfs layer into hub.docker.com/ROOT/root-demo... * Command line interface interaction is still subject to change
~70 MB/s Export data rate with warm cache from CVMFS to POSIX folder
Tracing & Specification Building A method for automated image specification
Tracing & Specification Building A method for automated image specification
Automated Specification - No internet building based - No disk cache on trace file - No fuse client Run Trace Specification Export Independently Trace by enabling Export with CVMFS_TRACEFILE cvmfs_shrinkwrap duing workflow (tar, squash, docker, ...)
> O(1) k lines >50k lines
Future Work Improve shrink wrapping workflow Understand exact use cases and optimize system based on these needs Improve automated specification building Make use of traces from multiple software runs to build more reliable specifications Direct exports to other formats than POSIX? Might be more efficient to avoid the “middleman”
Recommend
More recommend