HP XC System 2.x Software User Manual

Page 76

Advertising
background image

Each partition’s node limits supersede those specified by

-N

. Jobs that request more nodes than

the partition allows never leave the PENDING state. To use a specific partition, use the

srun

-p

option. Combinations of

-n

and

-N

control how job processes are distributed among nodes

according to the following

srun

policies:

-n

/

-N

combinations

srun

infers your intended number of processes per node if you

specify both the number of processes and the number of nodes
for your job. Thus

-n

16

-N

8 normally results in running 2

processes/node. But, see the next policy for exceptions.

Minimum interpretation

srun

interprets all node requests as minimum node requests (

-N16

means "at least 16 nodes"). If some nodes lack enough CPUs to
cover the process count specified by

-n

,

srun

will automatically

allocate more nodes (than mentioned with

-N

) to meet the need. For

example, if not all nodes have 2 working CPUs, then

-n32 -N16

together will allocate more than 16 nodes so that all processes are
supported. The actual number of nodes assigned (not the number
requested) is stored in environment variable SLURM_NNODES.

CPU overcommitment

By default,

srun

never allocates more than one process per CPU. If

you intend to assign multiple processes per CPU, you must invoke
the

srun -O

option along with

-n

and

-N

. Thus,

-n16 -N4 -O

together allow 2 processes per CPU on the 4 allocated 2-CPU nodes.

Inconsistent allocation

srun

rejects as errors inconsistent

-n

/

-N

combinations. For

example,

-n15 -N16

requests the impossible assignment of 15

processes to 16 nodes.

-c cpt

(

--cpus-per-task=cpt

)

The

-c cpt

option assigns cpt CPUs per process for this job (default is one CPU per process).

This option supports multithreaded programs that require more than a single CPU per process
for best performance.

For multithreaded programs where the density of CPUs is more important than a specific node
count, use both

-n

and

-c

on the

srun

execute line rather than

-N

. The options

-n16

and

-c2

result in whatever node allocation is needed to yield the requested 2 CPUs/process. This is

the reverse of CPU overcommitment (see

-N

and

-O

options).

-p part

(

--partition=part

)

The

-p part

option requests nodes only from the part partition. The default partition is

assigned by the system administrator.

-t minutes

(

--time=minutes

)

The

-t minutes

option allocates a total number of minutes for this job to run (default is the

current partition’s time limit). If the number of minutes exceeds the partition’s time limit, then
the job never leaves the PENDING state. When the time limit has been reached, SLURM
sends each job process

SIGTERM

followed (after a pause specified by SLURM’s

KillWait

configuration parameter) by

SIGKILL

.

-T nthreads

(

--threads=nthreads

)

The

-T nthreads

option requests that

srun

allocate

nthreads

threads to initiate and

control the parallel tasks in this job. The default is the smaller of either 10 or the number of
nodes actually allocated,

SLURM_NNODES

.

6-6

Using SLURM

Advertising