ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011
This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing co-allocation local queues with local LS LS LS schedulers global job clusters local jobs 2/22
Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 3/22
Runners • Extensible framework to add support for • Different application types • sequential/parallel, workflows, Bag of Tasks (BoTs) • Different middlewares/standards • Globus, DRMAA • Can submit diverse application types to a heterogeneous multi-cluster grid without changing your application • Responsible for • Stage-in/stage-out • Submitting the executable to the middleware • Monitoring job status • Responding to failures 4/22
Runners • OMRunner • For OpenMPI co-allocated jobs • PRunner • Modified OMRunner for ease-of-use and non-coallocated jobs (no need to write a job description file) • KRunner • Globus job submission tool for clusters using the Globus middleware • IRunner • KRunner based Ibis job submission tool • WRunner • For running BoTs and workflows 5/22
Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 6/22
Preparing the Environment • Four-step process 1. Login to a head-node e.g., fs0.das4.cs.vu.nl 2. Set up the environment • $PATH, $LD_LIBRARY_PATH etc. 3. Set SSH public key authentication (passwordless) 4. Create a Job Description File * KOALA hands-on document: http://bit.ly/lrcDVd 7/22
Login to a Head-Node 8/22
Setting The Environment Variables • Need to set the required environment variables in the .bashrc • PATH environment variable should include runner executables • export PATH=$PATH:/home/koala/koala_bin/bin • LD_LIBRARY_PATH should include DRMAA libraries • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /cm/shared/apps/sge/6.2u5/lib/lx26-amd64/ • Load required modules • Each module contains the information needed to configure the shell for a specific application • module load gcc • module load openmpi/gcc/default * This info is the hands-on document 9/22
Configuring Public Key Authentication • Runners use SSH for submitting tasks to remote hosts • e.g., from fs0 to fs3 • Need passwordless authentication • Use public/private key pairs • Public key is used to encrypt messages • Private key is used to decrypt the messages encrypted with the corresponding public key • kssh_keygen.sh –all • Generates public/private key pairs • Pushes public keys to all head nodes * This info is the hands-on document 10/22
Job Description File (JDF) Group Multiple Preferred Default Estimated Path of Aggregator 4 Processors Comment stdout/stderr Execution Site Components Directory Executable Runtime 11/22
Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 12/22
PRunner • -host <cluster> the preferred cluster to run the job • -c <node_count> number of nodes • -stdout <stdout_file> file used for standard output For more information http://www.st.ewi.tudelft.nl/ koala/ 13/22
OMRunner • -optComm try to optimize communication (place component to site with least latency) • -f <jdf_file> job description file comma separated list of clusters to exclude • -x <clusters> 14/22
Workflows • Applications with dependencies • Directed Acyclic Graph (DAG) • Nodes are executables • Edges are dependencies (files) 15/22
Sample Workflow Description <job id=“0” name=“Task_0_executable” > Parent (Root) <uses file=“file0.out” link=“output” type=“data” /> <uses file=“file1.out” link=“output” type=“data”/> 0 </job> <job id=“1” name="Task_1_executable“ > <uses file=“file0.out” link=“input” type=“data”/> </job> <job id=“2” name=“Task_2_executable” > <uses file=“file1.out” link=“input” type=“data”/> </job> 1 2 <child ref=“1”> <parent ref=“0”/> </child> Dependencies <child ref=“2”> Children <parent ref=“0”/> </child> 16/22
Bag of Tasks (BoT) • Conveniently parallel applications • DAG without dependencies • Usually used for parameter sweep applications • A single executable that runs for a large set of parameters (e.g., monte-carlo simulations, bioinformatics applications...) 17/22
Sample BoT Description range_1.in <job id=“0” name=“ PrimeSearch.py ”> <uses file=“ range_1.in ” link=“input”/> range_2.in <uses file=“primes.out” link=“output”/> </job> range_3.in <job id=“1” name=“ PrimeSearch.py ”> <uses file=“ range_2.in ” link=“input”> <uses file=“primes.out” link=“output”/> primes.out </job> <job id=“2” name=“ PrimeSearch.py ”> primes.out <uses file=“ range_3.in ” link=“input”/> <uses file=“primes.out” link=“output”/> primes.out </job> 18/22
Running BoTs & Workflows • Define a DAX (DAG in XML) file • Submit with wrunner • -f <job_description> • -p <policy> • single_site: whole BoT/workflow on a single site • -s : Preferred execution site • multi_site: tasks of the BoT/workflow are distributed to the grid based on the current load of the sites • Submit to fs3 • wrunner -f wf/Diamond.xml -p single_site -s fs3.das4.tudelft.nl • Use all sites for execution • wrunner -f wf/PrimeSearch.xml -p multi_site 19/22
Workflow Engine Architecture DRMAA SSH Workflow + Description Custom Protocol DFS Execution 20/22
Practical Work • Follow the steps in the practical work handout to set up your environment • After you download and extract the tar file you will have the following directory structure ComplexHPC_11 OMRunner PRunner WRunner applications -> executables wf -> DAX files 21/22
Summary • How to prepare the environment for KOALA runners • Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF/DAX) • How to submit jobs using KOALA • Sequential jobs (PRunner - no JDF) • Parallel jobs (OMRunner – with JDF) • Workflows (WRunner- with DAX) • BoTs (WRunner- with DAX) 22/22
“M.N.Yigitbasi@tudelft.nl” http://www.st.ewi.tudelft.nl/~nezih/ More Information: • Koala Project: http://st.ewi.tudelft.nl/koala • PDS publication database: http://www.pds.twi.tudelft.nl 23/22
Recommend
More recommend