efficient unpacking of required software from cernvm fs
play

Efficient unpacking of required software from CernVM-FS Samuel - PowerPoint PPT Presentation

Efficient unpacking of required software from CernVM-FS Samuel Teuber EP-SFT Openlab Summer Student Nicholas Hazekamp, Jakob Blomer, Gerardo Ganis 13.08.2018 Why is this Lack of internet connection on necessary? (1) compute nodes


  1. Efficient unpacking of required software from CernVM-FS Samuel Teuber • EP-SFT Openlab Summer Student Nicholas Hazekamp, Jakob Blomer, Gerardo Ganis 13.08.2018

  2. Why is this Lack of internet connection on ● necessary? (1) compute nodes Lack of local hard disks ● Lack of system level privileges ● Challenges faced in some HPC environments (e.g. NERSC)

  3. Why is this Run the same workflow every ● necessary? (2) time Minimize storage needs ● Minimize external factors ● Challenges faced in (e.g. internet connection) Benchmarking

  4. How are these challenges tackled today? cvmfs_preload No internet connection Prepopulate cvmfs cache No harddisk Cache on HPC file system uncvmfs Download entire CVMFS repositories No FUSE client Filter afterwards ? Benchmarking

  5. https://assets.nst.com.my/images/articles/26_bajajaaa_1521955064.jpg

  6. https://en.wikipedia.org/wiki/Stretch_wrap#/media/File:Pallet_wrapper.jpg Shrink Wrapping A method for efficiently packaging required software from CVMFS into standalone images

  7. Building a specification - No internet describing the - No disk cache necessary files for a - No fuse client software run ^/bar/etc/* /bar/Modules/setup.sh /foo/Packages/ROOT/* ^/foo/Packages/AliRoot/* Run Specification Export Independently Export with cvmfs_shrinkwrap (tar, squashfs, docker, ...)

  8. Application design

  9. Image architecture /cvmfs/ repo.cern.ch/ Exported repository structure Hardlinks .data/ 00/ … ff/ Content addressed file links .garbage/ Garbage Collection information Information for .provenance/ image reproducibility

  10. FS Traversal Thread pool Single Threaded (for now) Copies files between abstract ● ● interfaces Matches paths to specification (~3us ● ( extendible to other fs architectures ) lookup) Responsible for IO-intensive copying ● In memory ls for directories in ● specification Responsible for file creation & ● hardlinks

  11. Docker Injection

  12. Replacing CVMFS docker layers CVMFS Layer Custom Container Layers 1 OS Base Layer

  13. 1. Identify CVMFS Layer: Hash of layer as Image Label 2. Download “old” layer version CVMFS Layer 3. Update through shrinkwrap utility Custom Container Layers 1 4. Upload “new” layer version 5. Update Image Labels & Manifest OS Base Layer

  14. 1. Identify CVMFS Layer: New Custom Container Layers 2 Hash as Image Label 2. Download “old” image version CVMFS Layer 3. Update through shrink wrap utility Custom Container Layers 1 4. Upload “new” image version 5. Update Image Labels OS Base Layer

  15. Example & Evaluation

  16. From a vanilla docker image... FROM centos:7 ... ADD HEP_OSlibs.repo /etc/yum.repos.d/HEP_OSlibs.repo RUN yum install -y HEP_OSlibs

  17. ...to a CVMFS injected image That can run ROOT demos $ cvmfs_shrinkwrap oci ROOT/root-demo -c hub.docker.com.conf * Making image CVMFS injectable (injecting empty CVMFS layer)... Generating local copy of specified cvmfs repository subset... Packing tar layer... Compressing to gzip... Injecting updated cvmfs layer into hub.docker.com/ROOT/root-demo... * Command line interface interaction is still subject to change

  18. ~70 MB/s Export data rate with warm cache from CVMFS to POSIX folder

  19. Tracing & Specification Building A method for automated image specification

  20. Tracing & Specification Building A method for automated image specification

  21. Automated Specification - No internet building based - No disk cache on trace file - No fuse client Run Trace Specification Export Independently Trace by enabling Export with CVMFS_TRACEFILE cvmfs_shrinkwrap duing workflow (tar, squash, docker, ...)

  22. > O(1) k lines >50k lines

  23. Future Work Improve shrink wrapping workflow Understand exact use cases and optimize system based on these needs Improve automated specification building Make use of traces from multiple software runs to build more reliable specifications Direct exports to other formats than POSIX? Might be more efficient to avoid the “middleman”

Recommend


More recommend