interoperability via common build test bat
play

Interoperability via common Build & Test (BaT) Miron Livny - PowerPoint PPT Presentation

Interoperability via common Build & Test (BaT) Miron Livny Computer Sciences Department University of Wisconsin-Madison Thesis Interoperability of middleware can only be achieved if all components can be built and tested in a common


  1. Interoperability via common Build & Test (BaT) Miron Livny Computer Sciences Department University of Wisconsin-Madison

  2. Thesis Interoperability of middleware can only be achieved if all components can be built and tested in a common Build & Test (BaT) infrastructure h Necessary but not sufficient h Infrastructure must be production quality and distributed h Software must be portable h A community effort that leverages know-how and software tools 2

  3. Motivation › Experience with the Condor software h Includes external dependencies and interacts with external middleware h Ported to a wide range of platforms and operating systems h Increasing demand for automated testing › Experience with the Condor community h How Oracle has been using Condor for their build and test activities h Demand from “power users” for local BaT capabilities 3

  4. The NSF Middleware Initiative (NMI) Build and Test Effort 4

  5. GRIDS Center - Enabling Collaborative Science- Grid Research Integration Development & Support w w w .grids-center.org 5 w w w .nsf-m iddlew are.org

  6. The NMI program • Program lunched by Alan Blatecky in FY02 • ~ $10M per year • 6 “ System Integrator ” Teams – GRIDS Center • Architecture and Integration (ISI) • Deployment and Support (NCSA) • Testing (UWisc) – Grid Portals (TACC, UMich, NCSA, Indiana, UIC) – Instrument Middleware Architecture (Indiana) – NMI-EDIT (EDUCAUSE, Internet2, SURA) • 24 Smaller awards developing new capabilities w w w .grids-center.org 6 w w w .nsf-m iddlew are.org

  7. NMI Statement • Purpose – to develop, deploy and sustain a set of reusable and expandable middleware functions that benefit many science and engineering applications in a networked environment • Program encourages open source software development and development of middleware standards w w w .grids-center.org 7 w w w .nsf-m iddlew are.org

  8. The Build Challenge › Automation - “build the component at the push of a button!” always more to it than just “configure” & “make” • e.g., ssh to right host; cvs checkout; untar; setenv, etc. • › Reproducibility – “build the version we released 2 years ago!” Well-managed & comprehensive source repository • Know your “externals” and keep them around • › Portability – “build the component on nodeX.cluster.com!” No dependencies on “local” capabilities • Understand your hardware & software requirements • › Manageability – “run the build daily on 15 platforms and email me the outcome!” 8

  9. The Testing Challenge › All the same challenges as builds (automation, reproducibility, portability, manageability), plus : › Flexibility “test our RHEL4 binaries on RHEL5!” • “run our new tests on our old binaries” • important to decouple build & test functions • making tests just a part of a build -- instead of an • independent step -- makes it difficult/impossible to: • run new tests against old builds • test one platform’s binaries on another platform • run different tests at different frequencies 9

  10. Depending on our own software › What Did We Do? • We built the NMI Build & Test facility on top of Condor, Globus and other distributed computing technologies to automate the build, deploy, and test cycle. • To support it, we’ve had to construct and manage a dedicated, heterogeneous distributed computing facility. • Opposite extreme from typical “cluster” -- instead of 1000’s of identical CPUs, we have a handful of CPUs each for ~40 platforms. • Much harder to manage! You try finding a sysadmin tool that works on 40 platforms! › We’re just another demanding grid user - If the middleware does not deliver, we feel the pain!! 10

  11. NMI Build & Test Facility INPUT Distributed Build/Test Pool NMI Build Spec & Test File Software Condor DAG build/test Queue jobs Spec Customer File DAGMan Source Code results results DAG Customer Build/Test Scripts results Web Portal OUTPUT Finished MySQL Binaries Results DB

  12. Nam e Arch OS 1 at lant is.mcs.anl.gov sparc sol9 2 grandcent ral i386 rh9 3 janet i386 winxp 4 nmi- build15 i386 rh72 Numbers 5 nmi- build16 i386 rh8 6 nmi- build17 i386 rh9 7 nmi- build18 sparc sol9 8 nmi- build21 i386 fc2 9 nmi- build29 sparc sol8 10 nmi- build33 ia64 sles8 11 nmi- build5 i386 rhel3 12 nmi- build6 G5 osx 100 CPUs 13 nmi- rhas3- amd64 am d64 rhel3 14 nmi- sles8- amd64 am d64 sles8 15 nmi- t est - 3 i386 rh9 16 nmi- t est - 4 i386 rh9 17 [ unknown] hp hpux11 18 [ unknown] sgi irix6? 39 HW/OS “Platforms” 19 [ unknown] sparc sol10 20 [ unknown] sparc sol7 21 [ unknown] sparc sol8 22 [ unknown] sparc sol9 23 nmi- build1 i386 rh9 34 OS 24 nmi- build14 ppc aix52 25 nmi- build24 i386 tao1 26 nmi- build31 ppc aix52 27 nmi- build32 i386 fc3 28 nmi- build8 ia64 rhel3 9 HW Arch 29 nmi- dux40f alpha dux4 30 nmi- hpux11 hp hpux11 31 nmi- ia64- 1 ia64 sles8 32 nmi- sles8- ia64 ia64 sles8 33 rebbie i386 winxp 3 Sites 34 rocks- { 122,123,124} .sdsc.e i386 ??? 35 supermicro2 i386 rhel4 36 b80n15.sdsc.edu ppc aix51 37 imola i386 rh9 38 nmi- aix ppc aix52 39 nmi- build2 i386 rh8 ~100 GB of results per day 40 nmi- build3 i386 rh72 41 nmi- build4 i386 winxp 42 nmi- build7 G4 osx 43 nmi- build9 ia64 rhel3 44 nmi- hpux hp hpux10 45 nmi- irix sgi irix65 ~1400 Builds/tests per month 46 nmi- redhat 72- build i386 rh72 47 nmi- redhat 72- dev i386 rh72 48 nmi- redhat 80- ia32 i386 rh8 49 nmi- rh72- alpha alpha rh72 50 nmi- solaris8 sparc sol8 ~350 Condor jobs per day 51 nmi- solaris9 sparc sol9 52 nmi- t est - 1 i386 rh9 53 nmi- t ru64 alpha dux51 54 vger i386 rh73 55 monst er i386 rh9 56 nmi- t est - 5 i386 rh9 57 nmi- t est - 6 i386 rh9 58 nmi- t est - 7 i386 rh9 59 nmi- build22 i386 60 nmi- build25 i386 61 nmi- build26 i386 62 nmi- build27 i386 63 nmi- fedora i386 fc2 12

  13. Condor Build & Test › Automated Condor Builds • Two (sometimes three) separate Condor versions, each automatically built using NMI on 13-17 platforms nightly • Stable, developer, special release branches › Automated Condor Tests • Each nightly build’s output becomes the input to a new NMI run of our full Condor test suite › Ad-Hoc Builds & Tests • Each Condor developer can use NMI to submit ad-hoc builds & tests of their experimental workspaces or CVS branches to any or all platforms 13

  14. 14

  15. Users of BaT Facility › NMI Build & Test Facility was built to serve all NMI projects › Who else is building and testing? Globus project • SRB Project • NMI Middleware Distribution • Virtual Data Toolkit (VDT) • Work in progress • • TeraGrid • NEESgrid 15

  16. Example I – The SRB Client 16

  17. How did it start? › work done by Wayne Schroeder @ SDSC › started gently; took a little while for Wayne to warm up to the system ran into a few problems with bad matches before mastering • how we use prereqs • Our challenge: better docs, better error messages emailed Tolya with questions, Tolya responded “to shed some • more general light on the system and help avoid or better debug such problems in the future” › soon he got pretty comfortable with the system moved on to write his own glue scripts • expanded builds to 34 platforms (!) • 17

  18. Failure, failure, failure… success!

  19. Where we are today After ten days (4/10-4/20) Wayne got his builds ported to the NMI BaT facility and after less than 40 runs he reached the point where with “one button” the SRB project can build their client on 34 platforms, with no babysitting. He also found and fixed a problem in the HP-UX version … 19

  20. 20 Example II – The VDT

  21. What is the VDT? › A collection of software h Common Grid middleware (Condor, Globus, VOMS, and lots more…) h Virtual data software h Utilities (CA CRL update) h Configuration h Computing Infrastructure (Apache, Tomcat, MySQL, and more…) › An easy installation mechanism h Goal: Push a button, everything you need to be a consumer or provider of Grid resources just works h Two methods: • Pacman: installs and configures it all • RPM: installs subset of the software, no configuration › A support infrastructure h Coordinate bug fixing h Help desk h Understand community needs and wishes 21

  22. What is the VDT? › A highly successful collaborative effort h VDT Team at UW-Madison h VDS (Chimera/Pegasus) team • Provides the “V” in VDT h Condor Team h Globus Alliance h NMI Build and Test team h EDG/LCG/EGEE • Testing, patches, feedback… • Supply software: VOMS, CEmon, CRL-Update, and more… h Pacman • Provides easy installation capability h Users • LCG, EGEE, Open Science Grid, US-CMS, US-ATLAS, and many more 22

  23. VDT Supported Platforms › RedHat 7 › RedHat 9 › Debian 3.1 (Sarge) › RedHat Enterprise Linux 3 AS › RedHat Enterprise Linux 4 AS › Fedora Core 3 › Fedora Core 4 › ROCKS Linux 3.3 › Fermi Scientific Linux 3.0 › RedHat Enterprise Linux 3 AS ia64 › SuSE Linux 9 ia64 › RedHat Enterprise Linux 3 AS amd64 23

  24. VDT Components › Condor › Apache › Globus › Tomcat › DRM › MySQL › Clarens/jClarens › Lots of utilities › PRIMA › Lots of configuration scripts › GUMS › VOMS › MyProxy And more! 24

Recommend


More recommend