worker node software management
play

Worker Node Software Management: the VO perspective Mark Santcroos - PowerPoint PPT Presentation

Worker Node Software Management: the VO perspective Mark Santcroos Dennis van Dok Introduction e-BioScience group Bioinformatics Laboratory Clinical Epidemiology, Biostatistics and Bioinformatics Academic Medical Centre,


  1. Worker Node Software Management: the VO perspective Mark Santcroos Dennis van Dok

  2. Introduction • e-BioScience group – Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam • Intermediate between medical researchers and Dutch NGI • Support a wide range of applications in Next Generation Sequencing and Medical Imaging

  3. Worker Node Software • Running on 15 sites in the Netherlands • Base worker node installation (glite-WN) • Proof of Concept (PoC) software installation, heritage of Virtual Laboratory for e-Science (ended 2009)

  4. Perspective • Dennis van Dok is part of team that developed and managed the PoC environment at BiG Grid • Mark is a VO manager for the vlemed VO

  5. Job / Application Scenarios • Use installed software • Application in Job Sandbox • Fetch Application using wrapper • Upgrade versions in PoC distribution • Lobby for new versions with Site admins

  6. Limitations • Sandbox solution has size limits • Sandbox and wrapper have network overhead • Installed version out of date / too new • Responsibility of maintaining applications for end- user not always preferable • Site admins have to be in the loop

  7. High Level Goal • Have a flexible solution to make software available on the grid for end users that is also manageable from a VO admin perspective.

  8. Packaging Requirements • Automatic dependency resolution • Supported on Linux • Tools for install/update/remove/status • Running entire in userspace, unprivileged • Multiple installed versions of the same software

  9. Unsuitable candidates • rpm/yum • deb/apt • portage • Arch User Repository • pacman • … • Reasons: too OS specific, difficult to manage unprivileged

  10. Pkgsrc • Originating in NetBSD • Supported on Linux • Self contained • Actively maintained • Can be used as a non-privileged user • Large collection of applications already packaged • Can make use of system provided dependencies • Allows maintaining a local set of packages • Could add packages to the main distribution • Supports binary and source packages

  11. Creating a package DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip MAINTAINER= m.a.santcroos@amc.uva.nl HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes PKG_DESTDIR_SUPPORT= user-destdir INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile .include "../../mk/bsd.pkg.mk"

  12. Package Tree Management • update-tree.sh – Pull upstream pkgsrc changes – Create tarball – Put on website

  13. Implementation Principles • $VO_[VONAME]_SW_DIR is a directory shared between all worker nodes on a site • Run with a Software (VO) Manager proxy • Install packages per site / cluster / CE

  14. Architecture Shared Storage Area Mount Management Jobs Server (UI) Worker Nodes

  15. Managing packages • site-pkgtool.sh – Program to manage packages centrally – Initiates grid jobs • Install, Remove, Update • Init, Reinit, Check, Dump, Info, Version

  16. Script on the worker node • pkgsrc-cmd.sh – Wrapper program that runs on the worker node • Running as a grid job

  17. Information Management • list-installed-packages.sh – Display information about installed packages for sites • get-site-status.sh – Gather information from all supported sites • verify-package.sh – Check if a certain package is available on a site • get-tags.sh – Get all the package tags for the configured sites

  18. Installing a package • Check if distribution is fresh • Extract tree in scratch space • Build package and dependencies • Install package in shared software area • Install modulefile

  19. Environment Modules • “The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.” • Select versions • Setup environment • Integrates with system provided setup

  20. Tags • Software Tags in Information System (BDII) • Publish installed software versions per CE • Used for resource selection by adding it to the “Requirements” of a JDL • Use lcg-ManageVOTag tool to publish tag • Structure of tags is VO- ${vo}_SW_${package}

  21. Practical issues • Tags are not omnipresent • Shared area can become bottleneck • No intelligent matching on tags

  22. Conclusions • Flexible software management system • Relieves burden from user • Creating packages is still labor intensive work

  23. Discussion • One size fits all? (Did we reinvent the wheel?) • Connect to EGI AppDB? • EMI Community Repositories? • Usable for data distribution? • Other mechanism for matching?

  24. Links • pkgsrc – http://www.netbsd.org/docs/software/packages.html • Modules – http://modules.sourceforge.net/ • BiG Grid – http://www.biggrid.nl/ • Bioinformatics Laboratory – http://www.bioinformaticslaboratory.nl/ • Project Code – http://dvandok.github.com/userspace-package- management/

  25. Acknowledgements • AMC Bioinformatics Laboratory – Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik • Big Grid / Nikhef – Jan Just Keijser

  26. Thanks!

Recommend


More recommend