1introduction – Scotch Brand 5.1.10 User Manual

Page 6

Advertising
background image

1

Introduction

1.1

Static mapping

The efficient execution of a parallel program on a parallel machine requires that
the communicating processes of the program be assigned to the processors of the
machine so as to minimize its overall running time. When processes have a lim-
ited duration and their logical dependencies are accounted for, this optimization
problem is referred to as scheduling. When processes are assumed to coexist simul-
taneously for the entire duration of the program, it is referred to as mapping. It
amounts to balancing the computational weight of the processes among the proces-
sors of the machine, while reducing the cost of communication by keeping intensively
inter-communicating processes on nearby processors. In most cases, the underlying
computational structure of the parallel programs to map can be conveniently mod-
eled as a graph in which vertices correspond to processes that handle distributed
pieces of data, and edges reflect data dependencies. The mapping problem can then
be addressed by assigning processor labels to the vertices of the graph, so that all
processes assigned to some processor are loaded and run on it. In a SPMD con-
text, this is equivalent to the distribution across processors of the data structures
of parallel programs; in this case, all pieces of data assigned to some processor are
handled by a single process located on this processor.

A mapping is called static if it is computed prior to the execution of the program.

Static mapping is NP-complete in the general case [13]. Therefore, many studies
have been carried out in order to find sub-optimal solutions in reasonable time,
including the development of specific algorithms for common topologies such as the
hypercube [11, 21]. When the target machine is assumed to have a communication
network in the shape of a complete graph, the static mapping problem turns into the
partitioning

problem, which has also been intensely studied [4, 22, 31, 33, 49]. How-

ever, when mapping onto parallel machines the communication network of which is
not a bus, not accounting for the topology of the target machine usually leads to
worse running times, because simple cut minimization can induce more expensive
long-distance communication [21, 56].

1.2

Sparse matrix ordering

Many scientific and engineering problems can be modeled by sparse linear systems,
which are solved either by iterative or direct methods. To achieve efficiency with
direct methods, one must minimize the fill-in induced by factorization. This fill-in
is a direct consequence of the order in which the unknowns of the linear system are
numbered, and its effects are critical both in terms of memory and computation
costs.

An efficient way to compute fill reducing orderings of symmetric sparse matrices

is to use recursive nested dissection [17]. It amounts to computing a vertex set S
that separates the graph into two parts A and B, ordering S with the highest indices
that are still available, and proceeding recursively on parts A and B until their sizes
become smaller than some threshold value. This ordering guarantees that, at each
step, no non-zero term can appear in the factorization process between unknowns
of A and unknowns of B.

The main issue of the nested dissection ordering algorithm is thus to find small

vertex separators that balance the remaining subgraphs as evenly as possible, in
order to minimize fill-in and to increase concurrency in the factorization process.

6

Advertising