Chapter 2. technical overview, Runtime daemon, Libraries – PAR Technologies PARASTATION5 V5 User Manual

Page 7: Kernel modules, Technical overview, 3 2.2. libraries, 3 2.3. kernel modules

Advertising
background image

ParaStation5 Administrator's Guide

3

Chapter 2. Technical overview

Within this section, a brief technical overview of ParaStation5 will be given. The various software modules
constituting ParaStation5 are explained.

2.1. Runtime daemon

In order to enable ParaStation5 on a cluster, the ParaStation daemon psid(8) has to be installed on each
cluster node. This daemon process implements various functions:

• Install and configure local communication devices and protocols, e.g. load the p4sock kernel module and

set up proper routing information, if not already done at system startup.

• Queue parallel and serial tasks until requested resources are available.

• Distribute processes onto the available cluster nodes.

• Startup and monitor processes on cluster nodes. Also terminate and cleanup processes upon request.

• Monitor availability of other cluster nodes, send “I'm alive” messages.

• Handle input/output and signal forwarding.

• Service management commands from the administration tools.

The daemon processes periodically send information containing application processes, system load and
others to all other nodes within the cluster. So each daemon is able to monitor each other node, and in case
of absent alive messages, it will initiate proper actions, e.g. terminate a parallel task or mark this node as "no
longer available". Also, if a previously unavailable node is now responding, it will be marked as "available"
and will be used for upcoming parallel task. No intervention of the system administrator is required.

2.2. Libraries

In addition, a couple of libraries providing communication and management functionality, must be installed.
All libraries are provided as static versions, which will be linked to the application at compile time, or as
shared (dynamic) versions, which are pre-linked at compile time and folded in at runtime. There is also a
set of management and test tools installed on the cluster.

ParaStation5 comes with it's own version of MPI, based on MPIch2. The MPI library provides standard
MPIch2 compatible MPI functions. For communication purposes, it supports a couple of communication
paths in parallel, e.g. local communication using Shared memory, TCP or p4sock, Ethernet using p4sock
and TCP, Infiniband using verbs, Myrinet using GM or 10G Ethernet using DAPL. Thus, ParaStation5 is
able to spawn parallel tasks across nodes connected by different communication networks. ParaStation
will also make use of redundant interconnects, if a failure is encountered during startup of a parallel task.

There are different versions of the ParaStation MPI library available, depending on the hardware
architecture and compiler in use. For IA32, versions for GNU, Intel and Portland Group compilers are
available. For x86_64, versions for the GCC, Intel, Portland Group and Pathscale EKO compiler suite are
available. The versions support all available languages and language options for the selected compiler,
e.g. Fortran, Fortran90, C or C++. The different versions of the MPI library can be installed in parallel, thus
it is possible to compile and run applications using different compilers at the same node.

2.3. Kernel modules

Beside libraries enabling efficient communication and task management, ParaStation5 also provides a set
of kernel modules:

Advertising