Repository-based Job Launches Ivan Furic University of Florida
Current (up to incl DC1.5) approach • Edit scripts, fcl drivers etc in a subdirectory of dunepro@dune-offline • Launch GUI logs into dune-offline • Executes shell commands specified in launch template • cd to specific subdir • execute specific script • Issues: • Unsafe – files are available for anyone with dunepro k5login access to edit at any time, with no record of what happened • Not very re-usable: to launch from different subdir, or execute different script need to create new launch template
Repository-based technique • Script specified in launch template: • creates new temporary directory in /tmp • checks out (clones) a repository • runs a pre-defined script: “scripts/submissionScript.sh” • launch behavior driven by maximally generic launch script + number of command line parameters specified by POMS GUI • Previously, launch script was a wrapper for jobsub_submit which changed behavior based on few command line parameters • gensim / reco / mergeana etc
Proof-of-principle • Git repository is dunepro@dune-offline.fnal.gov:git/prod-repo.git • Need dunepro k5login ability to access (read / write) • Test launch POMS template in POMS GUI: ikf_test_repo_prod • Test launch POMS job type in POMS GUI: ikf_test_prod_repo • Test launch POMS campaign in POMS GUI: ikf_test_prod_repo • Working ”Hello World here’s my env dump” launch: https://pomsgpvm01.fnal.gov/poms/list_launch_file?campaign_id=1177&fname=20180224_200131_ikfuric
New launch script #!/bin/bash # IKF: This was in the jobsub_submit, figure out later how it interacts with role & subgroup, for starters remove . `ups setup -z /grid/fermiapp/products/common/db poms_jobsub_wrapper` # -l "priority=5" \ # args: NOTE args order matter jobsub_submit \ echo -e "\nRunning\n `basename $0` $@” -G dune \ cd /dune/app/home/dunepro/protodune-sp/DC1.5 -e SAM_EXPERIMENT=dune \ if [ x"$1" = "x--recovery" ] --role=Production \ then --subgroup=prod \ dataset="$2" --resource-provides=usage_model=OPPORTUNISTIC,DEDICATED \ else --expected-lifetime=24h \ dataset=$(./new_files_in.sh -e dune -d dc1.5_input) --memory 8000MB \ fi --dataset_definition=${dataset} \ # IKF: uncomment for debugging -N ${numfiles} \ dataset=dc1.5_input file:///dune/app/home/dunepro/protodune-sp/DC1.5/mini_reco_lar.sh # numfiles=$(samweb -e dune count-definition-files ${dataset}) EXITCODE=$? numfiles=10 echo "jobsub_submit terminated with exit code ${EXITCODE}" echo "dataset=${dataset}" echo "numfiles=${numfiles}"
Launch example output Convention : environment variables starting with DUNEPRO_ modify launch behavior NB Launch script not yet modified to use this
Launch Example Output (2) Temp directory only exists during launch Only permanent record is in git repository
Modifying the behavior of a launch • POMS GUI • Compose Campaign Stages • Click on Edit Button • Click on Parameter Overrides Edit Button • Parameter Convention: • Key of format --name= • Value format ”many words” • scripts/cmd_ln_to_env.sh converts to DUNEPRO_NAME=“many words” • todo: pass DUNEPRO_* env vars to worker nodes through jobsub_submit
Goal One repository for job launch • Likely one single branch, re-usable for all launches One repository for worker node scripts • Multiple branches, likely min 4: • Generator Type (no input SAM dataset) • Reco / Processing Type (1 input file -> 1 output file) • Merge Type (multiple input files -> 1 output file) • Analysis Type (1 input file -> 1 “histogram” output file) • Attempt to keep everything as general as possible, modify behavior via POMS parameter overrides (previous slide)
Next steps: • Launch from dune-offline.fnal.gov DNS round-robin alias (done, thanks to Ken Herner’s intervention on pomsgpvm01) • Use submissionScript.sh cmd line parameters to modify jobsub_submit behavior • Launch “DC1.5-type” campaign • Retrieve log files, outputs for 10 worker node jobs • Split worker node repository from launch repository • Use git archive option to generate worker node tarball
Recommend
More recommend