4 May 10, 2011 This Lecture ! global queue Lecture I & II - - PowerPoint PPT Presentation

4
SMART_READER_LITE
LIVE PREVIEW

4 May 10, 2011 This Lecture ! global queue Lecture I & II - - PowerPoint PPT Presentation

ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011 This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing


slide-1
SLIDE 1

5/10/11

ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala

Nezih Yigitbasi Delft University of Technology

4

May 10, 2011

slide-2
SLIDE 2

2/22

LS local jobs global job KOALA

clusters

LS LS load sharing co-allocation

global queue with grid scheduler local queues with local schedulers Lecture I & II

This Lecture !

slide-3
SLIDE 3

3/22

Outline

  • 1. Runners
  • 2. Preparing the Environment
  • Login to a head-node
  • Set up the environment
  • Set SSH public key authentication
  • Create a Job Description File (JDF)
  • 3. Job Submission
  • Prunner
  • OMRunner
  • WRunner
  • 4. Practical Work
slide-4
SLIDE 4

4/22

Runners

  • Extensible framework to add support for
  • Different application types
  • sequential/parallel, workflows, Bag of Tasks (BoTs)
  • Different middlewares/standards
  • Globus, DRMAA
  • Can submit diverse application types to a heterogeneous

multi-cluster grid without changing your application

  • Responsible for
  • Stage-in/stage-out
  • Submitting the executable to the middleware
  • Monitoring job status
  • Responding to failures
slide-5
SLIDE 5

5/22

  • OMRunner
  • For OpenMPI co-allocated jobs
  • PRunner
  • Modified OMRunner for ease-of-use and non-coallocated jobs

(no need to write a job description file)

  • KRunner
  • Globus job submission tool for clusters using the Globus

middleware

  • IRunner
  • KRunner based Ibis job submission tool
  • WRunner
  • For running BoTs and workflows

Runners

slide-6
SLIDE 6

6/22

Outline

  • 1. Runners
  • 2. Preparing the Environment
  • Login to a head-node
  • Set up the environment
  • Set SSH public key authentication
  • Create a Job Description File (JDF)
  • 3. Job Submission
  • Prunner
  • OMRunner
  • WRunner
  • 4. Practical Work
slide-7
SLIDE 7

7/22

Preparing the Environment

  • Four-step process

1. Login to a head-node e.g., fs0.das4.cs.vu.nl 2. Set up the environment

  • $PATH, $LD_LIBRARY_PATH etc.

3. Set SSH public key authentication (passwordless) 4. Create a Job Description File

* KOALA hands-on document: http://bit.ly/lrcDVd

slide-8
SLIDE 8

8/22

Login to a Head-Node

slide-9
SLIDE 9

9/22

Setting The Environment Variables

  • Need to set the required environment variables in the .bashrc
  • PATH environment variable should include runner executables
  • export PATH=$PATH:/home/koala/koala_bin/bin
  • LD_LIBRARY_PATH should include DRMAA libraries
  • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:

/cm/shared/apps/sge/6.2u5/lib/lx26-amd64/

  • Load required modules
  • Each module contains the information needed to configure the

shell for a specific application

  • module load gcc
  • module load openmpi/gcc/default

* This info is the hands-on document

slide-10
SLIDE 10

10/22

Configuring Public Key Authentication

  • Runners use SSH for submitting tasks to remote hosts
  • e.g., from fs0 to fs3
  • Need passwordless authentication
  • Use public/private key pairs
  • Public key is used to encrypt messages
  • Private key is used to decrypt the messages encrypted with the

corresponding public key

  • kssh_keygen.sh –all
  • Generates public/private key pairs
  • Pushes public keys to all head nodes

* This info is the hands-on document

slide-11
SLIDE 11

11/22

Job Description File (JDF)

4 Processors Default Directory Estimated Runtime stdout/stderr Preferred Execution Site Path of Executable Aggregator Comment Group Multiple Components

slide-12
SLIDE 12

12/22

Outline

  • 1. Runners
  • 2. Preparing the Environment
  • Login to a head-node
  • Set up the environment
  • Set SSH public key authentication
  • Create a Job Description File (JDF)
  • 3. Job Submission
  • Prunner
  • OMRunner
  • WRunner
  • 4. Practical Work
slide-13
SLIDE 13

13/22

PRunner

  • -host <cluster>

the preferred cluster to run the job

  • -c <node_count>

number of nodes

  • -stdout <stdout_file>

file used for standard output

For more information http://www.st.ewi.tudelft.nl/ koala/

slide-14
SLIDE 14

14/22

OMRunner

  • -optComm

try to optimize communication (place component to site with least latency)

  • -f <jdf_file>

job description file

  • -x <clusters>

comma separated list of clusters to exclude

slide-15
SLIDE 15

15/22

Workflows

  • Applications with dependencies
  • Directed Acyclic Graph (DAG)
  • Nodes are executables
  • Edges‏ are dependencies (files)
slide-16
SLIDE 16

16/22

<job id=“0” name=“Task_0_executable” > <uses file=“file0.out” link=“output” type=“data” /> <uses file=“file1.out” link=“output” type=“data”/> </job> <job id=“1” name="Task_1_executable“ > <uses file=“file0.out” link=“input” type=“data”/> </job> <job id=“2” name=“Task_2_executable” > <uses file=“file1.out” link=“input” type=“data”/> </job> <child ref=“1”> <parent ref=“0”/> </child> <child ref=“2”> <parent ref=“0”/> </child>

Sample Workflow Description

1 Dependencies Parent (Root) 2 Children

slide-17
SLIDE 17

17/22

Bag of Tasks (BoT)

  • Conveniently parallel applications
  • DAG without dependencies
  • Usually used for parameter sweep applications
  • A single executable that runs for a large set of parameters

(e.g., monte-carlo simulations, bioinformatics applications...)

slide-18
SLIDE 18

18/22

<job id=“0” name=“PrimeSearch.py”> <uses file=“range_1.in” link=“input”/> <uses file=“primes.out” link=“output”/> </job> <job id=“1” name=“PrimeSearch.py”> <uses file=“range_2.in” link=“input”> <uses file=“primes.out” link=“output”/> </job> <job id=“2” name=“PrimeSearch.py”> <uses file=“range_3.in” link=“input”/> <uses file=“primes.out” link=“output”/> </job>

Sample BoT Description

range_1.in primes.out range_2.in primes.out range_3.in primes.out

slide-19
SLIDE 19

19/22

  • Define a DAX (DAG in XML) file
  • Submit with wrunner
  • -f <job_description>
  • -p <policy>
  • single_site: whole BoT/workflow on a single site
  • -s: Preferred execution site
  • multi_site: tasks of the BoT/workflow are distributed to

the grid based on the current load of the sites

  • Submit to fs3
  • wrunner -f wf/Diamond.xml -p single_site -s fs3.das4.tudelft.nl
  • Use all sites for execution
  • wrunner -f wf/PrimeSearch.xml -p multi_site

Running BoTs & Workflows

slide-20
SLIDE 20

20/22

Workflow Engine Architecture

DRMAA DFS Workflow Description SSH + Custom Protocol Execution

slide-21
SLIDE 21

21/22

Practical Work

  • Follow the steps in the practical work handout to set up your

environment

  • After you download and extract the tar file you will have the

following directory structure ComplexHPC_11

OMRunner PRunner WRunner

wf -> DAX files applications -> executables

slide-22
SLIDE 22

22/22

Summary

  • How to prepare the environment for KOALA runners
  • Login to a head-node
  • Set up the environment
  • Set SSH public key authentication
  • Create a Job Description File (JDF/DAX)
  • How to submit jobs using KOALA
  • Sequential jobs (PRunner - no JDF)
  • Parallel jobs (OMRunner – with JDF)
  • Workflows (WRunner- with DAX)
  • BoTs (WRunner- with DAX)
slide-23
SLIDE 23

23/22

“M.N.Yigitbasi@tudelft.nl” http://www.st.ewi.tudelft.nl/~nezih/

More Information:

  • Koala Project: http://st.ewi.tudelft.nl/koala
  • PDS publication database: http://www.pds.twi.tudelft.nl