Example 7-6: submitting an hp-mpi job, 6 submitting a batch job or job script, Section 7.4.6 – HP XC System 2.x Software User Manual

Page 98

Advertising
background image

The

srun

command, used by the

mpirun

command to launch the MPI tasks in parallel,

determines the number of tasks to launch from the

SLURM_NPROCS

environment variable that

was set by LSF-HPC. Recall that the value of this environment variable is equivalent to the
number provided by the

-n

option of the

bsub

command.

Consider an HP XC system configuration in which

lsfhost.localdomain

is the LSF

execution host and nodes

n[1-10]

are compute nodes in the

lsf

partition. All nodes contain

2 processors, providing 20 processors for use by LSF jobs.

Example 7-6 runs a

hello_world

MPI program on four processors.

Example 7-6: Submitting an HP-MPI Job

$ bsub -n4 -I mpirun -srun ./hello_world

Job <75> is submitted to default queue <normal>.

<<Waiting for dispatch ...>>

<<Starting on lsfhost.localdomain>>

Hello world! I’m 0 of 4 on n2

Hello world! I’m 1 of 4 on n2

Hello world! I’m 2 of 4 on n4

Hello world! I’m 3 of 4 on n4

Example 7-7 runs the same

hello_world

MPI program on four processors, but uses the

external SLURM scheduler to request one task per node.

Example 7-7: Submitting an HP-MPI Job with a Specific Topology Request

$ bsub -n4 -ext "SLURM[nodes=4]" -I mpirun -srun ./hello_world

Job <77> is submitted to default queue <normal>.

<<Waiting for dispatch ...>>

<<Starting on lsfhost.localdomain>>

Hello world! I’m 0 of 4 on n1

Hello world! I’m 1 of 4 on n2

Hello world! I’m 2 of 4 on n3

Hello world! I’m 3 of 4 on n4

If the MPI job requires the use of an

appfile

, or has another reason that prohibits the use of

the

srun

command as the task launcher, some preprocessing to determine the node hostnames

to which

mpirun

’s standard task launcher should launch the tasks needs to be done. In such

scenarios, you need to write a batch script; there are several methods available for determining
the nodes in an allocation. One is using the

SLURM_JOBID

environment variable with the

squeue

command to query the nodes. Another is using LSF environment variables such as

LSB_HOSTS

and

LSB_MCPU_HOSTS

, which are prepared by the HP XC job starter script.

7.4.6 Submitting a Batch Job or Job Script

The

bsub

command format to submit a batch job or job script is:

bsub -n num-procs [bsub-options] script-name

The

-n

num-procs parameter specifies the number of processors the job requests.

-n

num-procs

is required for parallel jobs.

script-name

is the name of the batch job or script. Any

bsub

options can be included. The script can contain one or more

srun

or

mpirun

commands

and options.

The script will be executed once on the first allocated node, and any

srun

or

mpirun

commands within the script can use some or all of the allocated compute nodes.

7-14

Using LSF

Advertising