4
play

4 May 10, 2011 This Lecture ! global queue Lecture I & II - PowerPoint PPT Presentation

ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011 This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing


  1. ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011

  2. This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing co-allocation local queues with local LS LS LS schedulers global job clusters local jobs 2/22

  3. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 3/22

  4. Runners • Extensible framework to add support for • Different application types • sequential/parallel, workflows, Bag of Tasks (BoTs) • Different middlewares/standards • Globus, DRMAA • Can submit diverse application types to a heterogeneous multi-cluster grid without changing your application • Responsible for • Stage-in/stage-out • Submitting the executable to the middleware • Monitoring job status • Responding to failures 4/22

  5. Runners • OMRunner • For OpenMPI co-allocated jobs • PRunner • Modified OMRunner for ease-of-use and non-coallocated jobs (no need to write a job description file) • KRunner • Globus job submission tool for clusters using the Globus middleware • IRunner • KRunner based Ibis job submission tool • WRunner • For running BoTs and workflows 5/22

  6. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 6/22

  7. Preparing the Environment • Four-step process 1. Login to a head-node e.g., fs0.das4.cs.vu.nl 2. Set up the environment • $PATH, $LD_LIBRARY_PATH etc. 3. Set SSH public key authentication (passwordless) 4. Create a Job Description File * KOALA hands-on document: http://bit.ly/lrcDVd 7/22

  8. Login to a Head-Node 8/22

  9. Setting The Environment Variables • Need to set the required environment variables in the .bashrc • PATH environment variable should include runner executables • export PATH=$PATH:/home/koala/koala_bin/bin • LD_LIBRARY_PATH should include DRMAA libraries • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /cm/shared/apps/sge/6.2u5/lib/lx26-amd64/ • Load required modules • Each module contains the information needed to configure the shell for a specific application • module load gcc • module load openmpi/gcc/default * This info is the hands-on document 9/22

  10. Configuring Public Key Authentication • Runners use SSH for submitting tasks to remote hosts • e.g., from fs0 to fs3 • Need passwordless authentication • Use public/private key pairs • Public key is used to encrypt messages • Private key is used to decrypt the messages encrypted with the corresponding public key • kssh_keygen.sh –all • Generates public/private key pairs • Pushes public keys to all head nodes * This info is the hands-on document 10/22

  11. Job Description File (JDF) Group Multiple Preferred Default Estimated Path of Aggregator 4 Processors Comment stdout/stderr Execution Site Components Directory Executable Runtime 11/22

  12. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 12/22

  13. PRunner • -host <cluster> the preferred cluster to run the job • -c <node_count> number of nodes • -stdout <stdout_file> file used for standard output For more information http://www.st.ewi.tudelft.nl/ koala/ 13/22

  14. OMRunner • -optComm try to optimize communication (place component to site with least latency) • -f <jdf_file> job description file comma separated list of clusters to exclude • -x <clusters> 14/22

  15. Workflows • Applications with dependencies • Directed Acyclic Graph (DAG) • Nodes are executables • Edges ‏ are dependencies (files) 15/22

  16. Sample Workflow Description <job id=“0” name=“Task_0_executable” > Parent (Root) <uses file=“file0.out” link=“output” type=“data” /> <uses file=“file1.out” link=“output” type=“data”/> 0 </job> <job id=“1” name="Task_1_executable“ > <uses file=“file0.out” link=“input” type=“data”/> </job> <job id=“2” name=“Task_2_executable” > <uses file=“file1.out” link=“input” type=“data”/> </job> 1 2 <child ref=“1”> <parent ref=“0”/> </child> Dependencies <child ref=“2”> Children <parent ref=“0”/> </child> 16/22

  17. Bag of Tasks (BoT) • Conveniently parallel applications • DAG without dependencies • Usually used for parameter sweep applications • A single executable that runs for a large set of parameters (e.g., monte-carlo simulations, bioinformatics applications...) 17/22

  18. Sample BoT Description range_1.in <job id=“0” name=“ PrimeSearch.py ”> <uses file=“ range_1.in ” link=“input”/> range_2.in <uses file=“primes.out” link=“output”/> </job> range_3.in <job id=“1” name=“ PrimeSearch.py ”> <uses file=“ range_2.in ” link=“input”> <uses file=“primes.out” link=“output”/> primes.out </job> <job id=“2” name=“ PrimeSearch.py ”> primes.out <uses file=“ range_3.in ” link=“input”/> <uses file=“primes.out” link=“output”/> primes.out </job> 18/22

  19. Running BoTs & Workflows • Define a DAX (DAG in XML) file • Submit with wrunner • -f <job_description> • -p <policy> • single_site: whole BoT/workflow on a single site • -s : Preferred execution site • multi_site: tasks of the BoT/workflow are distributed to the grid based on the current load of the sites • Submit to fs3 • wrunner -f wf/Diamond.xml -p single_site -s fs3.das4.tudelft.nl • Use all sites for execution • wrunner -f wf/PrimeSearch.xml -p multi_site 19/22

  20. Workflow Engine Architecture DRMAA SSH Workflow + Description Custom Protocol DFS Execution 20/22

  21. Practical Work • Follow the steps in the practical work handout to set up your environment • After you download and extract the tar file you will have the following directory structure ComplexHPC_11 OMRunner PRunner WRunner applications -> executables wf -> DAX files 21/22

  22. Summary • How to prepare the environment for KOALA runners • Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF/DAX) • How to submit jobs using KOALA • Sequential jobs (PRunner - no JDF) • Parallel jobs (OMRunner – with JDF) • Workflows (WRunner- with DAX) • BoTs (WRunner- with DAX) 22/22

  23. “M.N.Yigitbasi@tudelft.nl” http://www.st.ewi.tudelft.nl/~nezih/ More Information: • Koala Project: http://st.ewi.tudelft.nl/koala • PDS publication database: http://www.pds.twi.tudelft.nl 23/22

Recommend


More recommend