Scotch Brand 5.1.10 User Manual

Page 9

Advertising

This function, which has already been considered by several authors for hyper-
cube target topologies [11, 21, 25], has several interesting properties: it is easy
to compute, allows incremental updates performed by iterative algorithms, and its
minimization favors the mapping of intensively intercommunicating processes onto
nearby processors; regardless of the type of routage implemented on the target
machine (store-and-forward or cut-through), it models the traffic on the intercon-
nection network and thus the risk of congestion.

The strong positive correlation between values of this function and effective

execution times has been experimentally verified by Hammond [21] on the CM-2,
and by Hendrickson and Leland [26] on the nCUBE 2.

The quality of mappings is evaluated with respect to the criteria for quality that

we have chosen: the balance of the computation load across processors, and the
minimization of the interprocessor communication cost modeled by function f

These criteria lead to the definition of several parameters, which are described
below.

For load balance, one can define µ

map

, the average load per computational

power unit (which does not depend on the mapping), and δ

map

, the load imbalance

ratio, as

map

def

∈V (S)

)

∈V (T )

)

and

map

def

∈V (T )






)

∈ V (S)

S,T

) = v

)






− µ

map

∈V (S)

)

However, since the maximum load imbalance ratio is provided by the user in input
of the mapping, the information given by these parameters is of little interest, since
what matters is the minimization of the communication cost function under this
load balance constraint.

For communication, the straightforward parameter to consider is f

. It can be

normalized as µ

exp

, the average edge expansion, which can be compared to µ

dil

the average edge dilation; these are defined as

exp

def

∈E(S)

)

and

dil

def

∈E(S)

|ρ

S,T

|E(S)|

exp

def

exp

dil

is smaller than 1 when the mapper succeeds in putting heavily inter-

communicating processes closer to each other than it does for lightly communicating
processes; they are equal if all edges have same weight.

3.1.3

The Dual Recursive Bipartitioning algorithm

Our mapping algorithm uses a divide and conquer approach to recursively allocate
subsets of processes to subsets of processors [41]. It starts by considering a set of
processors, also called domain, containing all the processors of the target machine,
and with which is associated the set of all the processes to map. At each step, the
algorithm bipartitions a yet unprocessed domain into two disjoint subdomains, and

Advertising

Popular Brands

Popular manuals