repository based job launches
play

Repository-based Job Launches Ivan Furic University of Florida - PowerPoint PPT Presentation

Repository-based Job Launches Ivan Furic University of Florida Current (up to incl DC1.5) approach Edit scripts, fcl drivers etc in a subdirectory of dunepro@dune-offline Launch GUI logs into dune-offline Executes shell commands


  1. Repository-based Job Launches Ivan Furic University of Florida

  2. Current (up to incl DC1.5) approach • Edit scripts, fcl drivers etc in a subdirectory of dunepro@dune-offline • Launch GUI logs into dune-offline • Executes shell commands specified in launch template • cd to specific subdir • execute specific script • Issues: • Unsafe – files are available for anyone with dunepro k5login access to edit at any time, with no record of what happened • Not very re-usable: to launch from different subdir, or execute different script need to create new launch template

  3. Repository-based technique • Script specified in launch template: • creates new temporary directory in /tmp • checks out (clones) a repository • runs a pre-defined script: “scripts/submissionScript.sh” • launch behavior driven by maximally generic launch script + number of command line parameters specified by POMS GUI • Previously, launch script was a wrapper for jobsub_submit which changed behavior based on few command line parameters • gensim / reco / mergeana etc

  4. Proof-of-principle • Git repository is dunepro@dune-offline.fnal.gov:git/prod-repo.git • Need dunepro k5login ability to access (read / write) • Test launch POMS template in POMS GUI: ikf_test_repo_prod • Test launch POMS job type in POMS GUI: ikf_test_prod_repo • Test launch POMS campaign in POMS GUI: ikf_test_prod_repo • Working ”Hello World here’s my env dump” launch: https://pomsgpvm01.fnal.gov/poms/list_launch_file?campaign_id=1177&fname=20180224_200131_ikfuric

  5. New launch script #!/bin/bash # IKF: This was in the jobsub_submit, figure out later how it interacts with role & subgroup, for starters remove . `ups setup -z /grid/fermiapp/products/common/db poms_jobsub_wrapper` # -l "priority=5" \ # args: NOTE args order matter jobsub_submit \ echo -e "\nRunning\n `basename $0` $@” -G dune \ cd /dune/app/home/dunepro/protodune-sp/DC1.5 -e SAM_EXPERIMENT=dune \ if [ x"$1" = "x--recovery" ] --role=Production \ then --subgroup=prod \ dataset="$2" --resource-provides=usage_model=OPPORTUNISTIC,DEDICATED \ else --expected-lifetime=24h \ dataset=$(./new_files_in.sh -e dune -d dc1.5_input) --memory 8000MB \ fi --dataset_definition=${dataset} \ # IKF: uncomment for debugging -N ${numfiles} \ dataset=dc1.5_input file:///dune/app/home/dunepro/protodune-sp/DC1.5/mini_reco_lar.sh # numfiles=$(samweb -e dune count-definition-files ${dataset}) EXITCODE=$? numfiles=10 echo "jobsub_submit terminated with exit code ${EXITCODE}" echo "dataset=${dataset}" echo "numfiles=${numfiles}"

  6. Launch example output Convention : environment variables starting with DUNEPRO_ modify launch behavior NB Launch script not yet modified to use this

  7. Launch Example Output (2) Temp directory only exists during launch Only permanent record is in git repository

  8. Modifying the behavior of a launch • POMS GUI • Compose Campaign Stages • Click on Edit Button • Click on Parameter Overrides Edit Button • Parameter Convention: • Key of format --name= • Value format ”many words” • scripts/cmd_ln_to_env.sh converts to DUNEPRO_NAME=“many words” • todo: pass DUNEPRO_* env vars to worker nodes through jobsub_submit

  9. Goal One repository for job launch • Likely one single branch, re-usable for all launches One repository for worker node scripts • Multiple branches, likely min 4: • Generator Type (no input SAM dataset) • Reco / Processing Type (1 input file -> 1 output file) • Merge Type (multiple input files -> 1 output file) • Analysis Type (1 input file -> 1 “histogram” output file) • Attempt to keep everything as general as possible, modify behavior via POMS parameter overrides (previous slide)

  10. Next steps: • Launch from dune-offline.fnal.gov DNS round-robin alias (done, thanks to Ken Herner’s intervention on pomsgpvm01) • Use submissionScript.sh cmd line parameters to modify jobsub_submit behavior • Launch “DC1.5-type” campaign • Retrieve log files, outputs for 10 worker node jobs • Split worker node repository from launch repository • Use git archive option to generate worker node tarball

Recommend


More recommend