3 notes on lsf-hpc – HP XC System 2.x Software User Manual

Page 87

Advertising
background image

To illustrate how the external scheduler is used to launch an application, consider the following
command line, which launches an application on ten nodes with one task per node:

$ bsub -n 10 -ext "SLURM[nodes=10]" srun my_app

The following command line launches the same application, also on ten nodes, but stipulates
that node

n16

should not be used:

$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" srun my_app

7.1.3 Notes on LSF-HPC

The following are noteworthy items for users of LSF-HPC on HP XC systems:

You must run jobs as a non-root user such as

lsfadmin

or any other local user; do not

run jobs as the root user.

A SLURM partition named

lsf

is used to manage LSF jobs. You can view information

about this partition with the

sinfo

command.

LSF daemons only run on one node in the HP XC system. As a result, the

lshosts

and

bhosts

commands only list one host that represents all the resources of the HP XC

system. The total number of CPUs for that host should be equal to the total number of CPUs
found in the nodes assigned to the SLURM

lsf

partition.

The total number of processors for that host should be equal to the total number of
processors assigned to the SLURM

lsf

partition.

When a job is submitted and the resources are available, LSF-HPC creates a properly sized
SLURM allocation and adds several standard LSF environment variables to the environment
in which the job is to be run. The following two environment variables are also added:

SLURM_JOBID

This environment variable is created so that subsequent

srun

commands make use of the SLURM allocation created by
LSF-HPC for the job. This variable can be used by a job script to
query information about the SLURM allocation, as shown here:

$ squeue --jobs $SLURM_JOBID

SLURM_NPROCS

This environment variable passes along the total number of
tasks requested with the

bsub -n

command to all subsequent

srun

commands. User scripts can override this value with the

srun -n

command, but the new value must be less than or

equal to the original number of requested tasks.

LSF-HPC dispatches all jobs locally. The default installation of LSF-HPC for SLURM
on the HP XC system provides a job starter script that is configured for use by all
LSF-HPC queues. This job starter script adjusts the

LSB_HOSTS

and

LSB_MCPU_HOSTS

environment variables to the correct resource values in the allocation. Then, the job starter
script uses the

srun

command to launch the user task on the first node in the allocation.

If this job starter script is not configured for a queue, the user jobs begin execution locally
on the LSF-HPC execution host. In this case, it is recommended that the user job uses one
or more

srun

commands to make use of the resources allocated to the job. Work done

on the LSF-HPC execution host competes for CPU time with the LSF-HPC daemons, and
could affect the overall performance of LSF-HPC on the HP XC system.

The

bqueues -l

command displays the full queue configuration, including whether or

not a job starter script has been configured. See the Platform LSF documentation or the

bqueues

(1)

manpage for more information on the use of this command.

For example, consider an LSF-HPC LSF configuration in which node

n20

is the LSF-HPC

execution host and nodes

n[1-10]

are in the SLURM

lsf

partition. The default

normal

Using LSF

7-3

Advertising