Worker Node Software Management: the VO perspective Mark Santcroos Dennis van Dok
Introduction • e-BioScience group – Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam • Intermediate between medical researchers and Dutch NGI • Support a wide range of applications in Next Generation Sequencing and Medical Imaging
Worker Node Software • Running on 15 sites in the Netherlands • Base worker node installation (glite-WN) • Proof of Concept (PoC) software installation, heritage of Virtual Laboratory for e-Science (ended 2009)
Perspective • Dennis van Dok is part of team that developed and managed the PoC environment at BiG Grid • Mark is a VO manager for the vlemed VO
Job / Application Scenarios • Use installed software • Application in Job Sandbox • Fetch Application using wrapper • Upgrade versions in PoC distribution • Lobby for new versions with Site admins
Limitations • Sandbox solution has size limits • Sandbox and wrapper have network overhead • Installed version out of date / too new • Responsibility of maintaining applications for end- user not always preferable • Site admins have to be in the loop
High Level Goal • Have a flexible solution to make software available on the grid for end users that is also manageable from a VO admin perspective.
Packaging Requirements • Automatic dependency resolution • Supported on Linux • Tools for install/update/remove/status • Running entire in userspace, unprivileged • Multiple installed versions of the same software
Unsuitable candidates • rpm/yum • deb/apt • portage • Arch User Repository • pacman • … • Reasons: too OS specific, difficult to manage unprivileged
Pkgsrc • Originating in NetBSD • Supported on Linux • Self contained • Actively maintained • Can be used as a non-privileged user • Large collection of applications already packaged • Can make use of system provided dependencies • Allows maintaining a local set of packages • Could add packages to the main distribution • Supports binary and source packages
Creating a package DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip MAINTAINER= m.a.santcroos@amc.uva.nl HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes PKG_DESTDIR_SUPPORT= user-destdir INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile .include "../../mk/bsd.pkg.mk"
Package Tree Management • update-tree.sh – Pull upstream pkgsrc changes – Create tarball – Put on website
Implementation Principles • $VO_[VONAME]_SW_DIR is a directory shared between all worker nodes on a site • Run with a Software (VO) Manager proxy • Install packages per site / cluster / CE
Architecture Shared Storage Area Mount Management Jobs Server (UI) Worker Nodes
Managing packages • site-pkgtool.sh – Program to manage packages centrally – Initiates grid jobs • Install, Remove, Update • Init, Reinit, Check, Dump, Info, Version
Script on the worker node • pkgsrc-cmd.sh – Wrapper program that runs on the worker node • Running as a grid job
Information Management • list-installed-packages.sh – Display information about installed packages for sites • get-site-status.sh – Gather information from all supported sites • verify-package.sh – Check if a certain package is available on a site • get-tags.sh – Get all the package tags for the configured sites
Installing a package • Check if distribution is fresh • Extract tree in scratch space • Build package and dependencies • Install package in shared software area • Install modulefile
Environment Modules • “The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.” • Select versions • Setup environment • Integrates with system provided setup
Tags • Software Tags in Information System (BDII) • Publish installed software versions per CE • Used for resource selection by adding it to the “Requirements” of a JDL • Use lcg-ManageVOTag tool to publish tag • Structure of tags is VO- ${vo}_SW_${package}
Practical issues • Tags are not omnipresent • Shared area can become bottleneck • No intelligent matching on tags
Conclusions • Flexible software management system • Relieves burden from user • Creating packages is still labor intensive work
Discussion • One size fits all? (Did we reinvent the wheel?) • Connect to EGI AppDB? • EMI Community Repositories? • Usable for data distribution? • Other mechanism for matching?
Links • pkgsrc – http://www.netbsd.org/docs/software/packages.html • Modules – http://modules.sourceforge.net/ • BiG Grid – http://www.biggrid.nl/ • Bioinformatics Laboratory – http://www.bioinformaticslaboratory.nl/ • Project Code – http://dvandok.github.com/userspace-package- management/
Acknowledgements • AMC Bioinformatics Laboratory – Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik • Big Grid / Nikhef – Jan Just Keijser
Thanks!
Recommend
More recommend