Scotch Brand 5.1.10 User Manual

Page 16

Advertising
background image

vertex separators are computed by using direct heuristics [28, 37], or from edge
separators [48, and included references] by minimum cover techniques [9, 30], but
other techniques such as spectral vertex partitioning have also been used [49].

Provided that good vertex separators are found, the nested dissection algorithm

produces orderings which, both in terms of fill-in and operation count, compare
favorably [19, 31, 46] to the ones obtained with the minimum degree algorithm [39].
Moreover, the elimination trees induced by nested dissection are broader, shorter,
and better balanced, and therefore exhibit much more concurrency in the context
of parallel Cholesky factorization [3, 14, 15, 19, 46, 53, and included references].

3.2.3

Hybridization

Due to their complementary nature, several schemes have been proposed to
hybridize the two methods [28, 34, 46]. However, to our knowledge, only loose
couplings have been achieved: incomplete nested dissection is performed on the
graph to order, and the resulting subgraphs are passed to some minimum degree
algorithm. This results in the fact that the minimum degree algorithm does not
have exact degree values for all of the boundary vertices of the subgraphs, leading
to a misbehavior of the vertex selection process.

Our ordering program implements a tight coupling of the nested dissection and

minimum degree algorithms, that allows each of them to take advantage of the infor-
mation computed by the other. First, the nested dissection algorithm provides exact
degree values for the boundary vertices of the subgraphs passed to the minimum
degree algorithm (called halo minimum degree since it has a partial visibility of the
neighborhood of the subgraph). Second, the minimum degree algorithm returns the
assembly tree that it computes for each subgraph, thus allowing for supervariable
amalgamation, in order to obtain column-blocks of a size suitable for BLAS3 block
computations.

As for our mapping program, it is possible to combine ordering methods into

ordering strategies, which allow the user to select the proper methods with respect
to the characteristics of the subgraphs.

The ordering program is completely parametrized by its ordering strategy. The

nested dissection method allows the user to take advantage of all of the graph
partitioning routines that have been developed in the earlier stages of the Scotch
project. Internal ordering strategies for the separators are relevant in the case of
sequential or parallel [20, 50, 51, 52] block solving, to select ordering algorithms
that minimize the number of extra-diagonal blocks [7], thus allowing for efficient
use of BLAS3 primitives, and to reduce inter-processor communication.

3.2.4

Performance criteria

The quality of orderings is evaluated with respect to several criteria. The first
one, NNZ, is the number of non-zero terms in the factored reordered matrix. The
second one, OPC, is the operation count, that is the number of arithmetic operations
required to factor the matrix. The operation count that we have considered takes
into consideration all operations (additions, subtractions, multiplications, divisions)
required by Cholesky factorization, except square roots; it is equal to

c

n

2

c

, where

n

c

is the number of non-zeros of column c of the factored matrix, diagonal included.

A third criterion for quality is the shape of the elimination tree; concurrency in
parallel solving is all the higher as the elimination tree is broad and short. To

16

Advertising