3 launching and managing jobs quick start, 1 introduction, 2 getting information about queues – HP XC System 2.x Software User Manual

Page 33: 3 getting information about resources, Section 2.3)

Advertising
background image

2.3 Launching and Managing Jobs Quick Start

This section provides a brief description of some of the many ways to launch jobs, manage jobs,
and get information about jobs on an HP XC system. This section is intended only as a quick
overview about some basic ways of running and managing jobs. Full information and details
about the HP XC job launch environment are provided in the SLURM chapter (Chapter 6) and
the LSF chapter (Chapter 7) of this manual.

2.3.1 Introduction

As described in Section 1.4, SLURM and LSF cooperate to run and manage jobs on the HP
XC system, combining LSF’s powerful and flexible scheduling functionality with SLURM’s
scalable parallel job launching capabilities.

SLURM is the low-level resource manager and job launcher, and performs processor allocation
for jobs. LSF gathers information about the cluster from SLURM — when a job is ready to be
launched, LSF creates a SLURM node allocation and dispatches the job to that allocation.

Although jobs can be launched directly using SLURM, it is recommended that you use LSF
to take advantage of its scheduling and job management capabilities. SLURM options can be
added to the LSF job launch command line to further define job launch requirements. The
HP-MPI

mpirun

command and its options can be used within LSF to launch jobs that require

MPI’s high-performance message-passing capabilities.

When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF
jobs. This partition is called the

lsf

partition.

When a job is submitted to LSF, the LSF scheduler prioritizes the job and waits until the
required resources (compute nodes from the

lsf

partition) are available.

When the requested resources are available for the job, LSF-HPC creates a SLURM allocation
of nodes on behalf of the user, sets the SLURM

JobID

for the allocation, and dispatches the

job with the LSF-HPC

JOB_STARTER

script to the first allocated node.

A detailed explanation of how SLURM and LSF interact to launch and manage jobs is provided
in Section 7.1.4.

2.3.2 Getting Information About Queues

The LSF

bqueues

command lists the configured job queues in LSF. By default,

bqueues

returns the following information about all queues: queue name, queue priority, queue status,
job slot statistics, and job state statistics.

To get information about queues, enter the

bqueues

as follows:

$ bqueues

Refer to Section 7.3.4 for more information about using this command and a sample of its output.

2.3.3 Getting Information About Resources

The LSF

bhosts

,

lshosts

, and

lsload

commands are quick ways to get information about

system resources. LSF daemons run on only one node in the HP XC system, so the

bhosts

and

lshosts

commands will list one host — which represents all the resources of the HP

XC system. The total number of processors for that host should be equal to the total number of
processors assigned to the SLURM

lsf

partition.

The LSF

bhosts

command provides a summary of the jobs on the system and information

about the current state of LSF.

$ bhosts

Refer to Section 7.3.1 for more information about using this command and a sample of
its output.

Using the System

2-7

Advertising