7 using lsf, 1 introduction to lsf in the hp xc environment, 1 overview of lsf – HP XC System 2.x Software User Manual

Page 85: Chapter 7, Chapter 7), Using lsf

Advertising
background image

7

Using LSF

The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system
resource manager used on the HP XC system. LSF is included with HP XC, and is an integral
part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which
places the job in a queue and allows it to run when the necessary resources become available.
In addition to launching jobs, LSF provides extensive job management and information
capabilities. LSF schedules, launches, controls, and tracks jobs that are submitted to it according
to the policies established by the HP XC site administrator.

This chapter describes the functionality of LSF in an HP XC system, and discusses how
to use some basic LSF commands to submit jobs, manage jobs, and access job information.
The following topics are discussed:

Introduction to LSF on HP XC (Section 7.1)

Determining the LSF execution host (Section 7.2)

Determining available LSF resources (Section 7.3)

Submitting jobs to LSF (Section 7.4)

Getting information about LSF jobs (Section 7.5)

Working interactively within an LSF-HPC allocation (Section 7.6)

LSF Equivalents of SLURM options (Section 7.7)

For full information about LSF, refer to the standard LSF documentation set, which is described
in the Related Information section of this manual. LSF manpages are also available online on
the HP XC system.

7.1 Introduction to LSF in the HP XC Environment

This section introduces you to LSF in the HP XC environment. It provides an overview of how
LSF works, and discusses some of the features and differences of standard LSF compared to
LSF on an HP XC system. This section also contains an important discussion of how LSF and
SLURM work together to provide the HP XC job management environment. A description of
SLURM is provided in Chapter 6.

7.1.1 Overview of LSF

LSF is a batch system resource manager. In the HP XC environment, LSF manages just one
resource — the total number of HP XC processors designated for batch processing. The HP
XC system is based on dedicating processors to jobs, and LSF is implemented to use these
processors in the most efficient manner.

As jobs are submitted to LSF, LSF places the jobs in queues and determines an overall priority
for launching the jobs. When the required number of HP XC processors become available to
launch the next job, LSF reserves them and launches the job on these processors. When a job is
completed, LSF returns job output, job information, and any errors.

A standard LSF installation on an HP XC system would consist of LSF daemons running
on every node and providing activity and resource information for each node. LSF-HPC for
SLURM on an HP XC system consists of one node running LSF-HPC daemons, and these
daemons communicate with SLURM for resource information about the other nodes. LSF-HPC
consolidates this resource information into one "virtual" node. Thus LSF-HPC integrated with

Using LSF

7-1

Advertising