3algorithms, 1 static mapping by dual recursive bipartitioning – Scotch Brand 5.1.10 User Manual

Page 8

Advertising
background image

The free/libre software license under which Scotch 5.1 is distributed is

the CeCILL-C license [6], which has basically the same features as the GNU
LGPL (“Lesser General Public License”): ability to link the code as a library
to any free/libre or even proprietary software, ability to modify the code and to
redistribute these modifications. Version 4.0 of Scotch was distributed under the
LGPL itself.

Please refer to section 8 to see how to obtain the free/libre distribution of

Scotch

.

3

Algorithms

3.1

Static mapping by Dual Recursive Bipartitioning

For a detailed description of the mapping algorithm and an extensive analysis of its
performance, please refer to [41, 44]. In the next sections, we will only outline the
most important aspects of the algorithm.

3.1.1

Static mapping

The parallel program to be mapped onto the target architecture is modeled by a val-
uated unoriented graph S called source graph or process graph, the vertices of which
represent the processes of the parallel program, and the edges of which the commu-
nication channels between communicating processes. Vertex- and edge- valuations
associate with every vertex v

S

and every edge e

S

of S integer numbers w

S

(v

S

) and

w

S

(e

S

) which estimate the computation weight of the corresponding process and

the amount of communication to be transmitted on the channel, respectively.

The target machine onto which is mapped the parallel program is also modeled

by a valuated unoriented graph T called target graph or architecture graph. Vertices
v

T

and edges e

T

of T are assigned integer weights w

T

(v

T

) and w

T

(e

T

), which

estimate the computational power of the corresponding processor and the cost of
traversal of the inter-processor link, respectively.

A mapping from S to T consists of two applications τ

S,T

: V (S) −→ V (T ) and

ρ

S,T

: E(S) −→ P(E(T )), where P(E(T )) denotes the set of all simple loopless

paths which can be built from E(T ). τ

S,T

(v

S

) = v

T

if process v

S

of S is mapped

onto processor v

T

of T , and ρ

S,T

(e

S

) = {e

1

T

, e

2

T

, . . . , e

n

T

} if communication channel

e

S

of S is routed through communication links e

1

T

, e

2

T

, . . . , e

n

T

of T . |ρ

S,T

(e

S

)|

denotes the dilation of edge e

S

, that is, the number of edges of E(T ) used to route

e

S

.

3.1.2

Cost function and performance criteria

The computation of efficient static mappings requires an a priori knowledge of the
dynamic behavior of the target machine with respect to the programs which are
run on it. This knowledge is synthesized in a cost function, the nature of which
determines the characteristics of the desired optimal mappings. The goal of our
mapping algorithm is to minimize some communication cost function, while keeping
the load balance within a specified tolerance. The communication cost function f

C

that we have chosen is the sum, for all edges, of their dilation multiplied by their
weight:

f

C

S,T

, ρ

S,T

)

def

=

e

S

∈E(S)

w

S

(e

S

) |ρ

S,T

(e

S

)| .

8

Advertising